[BioC] from using biomaRt and r10kcod
Diego Diez
diez at kuicr.kyoto-u.ac.jp
Tue May 15 07:14:08 CEST 2007
Hi Weiwei and James,
(sorry Weiwei, as I sent this email the first time only to you when
my intention was to send it to the list too).
On May 15, 2007, at 5:29 AM, Weiwei Shi wrote:
> Hi, there:
>
> I happened to re-address this question of codelink probe id to human
> entrezgene id. I describe my question using an example:
>
> by using r10kcod package, you can find probe "GE16490" mapped to
> "502674", which I assume it is rat entrezgene id. However, when I use
> biomaRt to convert all rat entrezgene id in this array to human ones,
> I found the following maps involving 502674:
>
> id MappedID rat.count human.count
> 4167 296197 11034 1 2
> 7021 502674 11034 1 2
>
I'm not too familiar with the biomaRt package but I guess that this
result what is telling you is that you have two rat entrez id's
296197 and 502674 (each appearing only once), which map to one human
entrez id 11034 (appearing twice, one time for each rat id).
> so, basically, 296197, 502674 and 11034 are all associated with
> protein "destrin". To be accurate, 296197 is a rat protein which is
> similar to destrin.
>
> However, as shown in
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=gene
> , the other two (11034 and *502674*) are human ids (if I am wrong
> here, please correct me).
>
Well, for me searching 502674 using Entrez Gene comes up a link to
the Destrin rat gene:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
db=gene&cmd=search&term=502674
clicking on this entry I can see the information about the Dstn
(destrin) gene. In the bottom of the page there are mappings to
different sequences (Related sequences). One is CB785830.1 and the
other CF111187.1 The later one is the one used in r10kcod to map from
Codelink probe to Genbank,
GE16490 -> CF111187.1
and then, this is used to map to Entrez Gene, if and understand a
little how AnnBuilder works (that may not be the case). Of course, I
use also the mappings provided from the manufacturer from probe ids
to Entrez Gene and Unigene but for this particular probe, there is no
such mapping in the current mappings provided (last updated March 31,
2006 so they are pretty old).
In fact, in those files, there is also the information about
homologues in the other two organisms (from human, mouse and rat) and
in the human probes that map to Entrez Gene 11034 I can find that
they map to rat Entrez Gene 502674, in agreement with the biomaRt
results.
> so my questions are:
>
> 1. whether 502674 is a rat entrezgene id or human one?
>
I would definitely say that it is a rat id.
> 2. r10kcod is wrong or ncbi is wrong or my understanding is wrong (i
> assume the last one :)
>
neither are wrong from my point of view, but let first see if we are
seeing the same thing when we look for 502674 in Entrez Gene.
> 3. i found many many-2-many maps in this process of rat to human
> entrezgene ids. Like the following:
>
>> t0[t0[,1]== 396527,]
>>
> id MappedID rat.count human.count
> 6608 396527 54576 9 4
> 6609 396527 54575 9 4
> 6610 396527 54600 9 4
> 6611 396527 54577 9 4
> 6612 396527 54578 9 4
> 6613 396527 54579 9 4
> 6614 396527 54657 9 4
> 6615 396527 54659 9 4
> 6616 396527 54658 9 4
>
>> t0[t0[,2]== 54576,]
>>
> id MappedID rat.count human.count
> 2494 113992 54576 9 4
> 6608 396527 54576 9 4
> 6617 396551 54576 9 4
> 6626 396552 54576 9 4
>
>> t0[t0[,2]== 54577,]
>>
> id MappedID rat.count human.count
> 2497 113992 54577 9 4
> 6611 396527 54577 9 4
> 6620 396551 54577 9 4
> 6629 396552 54577 9 4
>
> so, basically all the ids are related to different polypeptides
> associated with UDP glucuronosyltransferase 1 family. Are there some
> other situations causing this many2many mappings?
>
>
As for this, James has already answered (thanks for that). The probes
are 30 base pair long, so it is not strange, but on the contrary,
very common to find those probes mapping to multiple genes that can
have related or unrelated functions. Is less common in the Codelink
arrays to have multiple probes mapping to the same gene, but again,
you can have multiple probes mapping to different Genbank ids that
correspond to the same Entrez Gene identifier. The fact that you can
have different paralogues and orthologues sequences and even
sometimes unrelated sequences sharing the same piece of 30 base pair
oligonucleotides makes this a very complex problem with no easy
solution.
Regards,
Diego.
-----------------------------------------------
Diego Diez, PhD.
Bioknowledge systems, Kanehisa lab.
Bioinformatics center,
Institute for Chemical Research,
Kyoto University.
Gokasho, Uji, Kyoto 611-0011 JAPAN.
e-mail: diez at kuicr.kyoto-u.ac.jp
url: http://web.kuicr.kyoto-u.ac.jp/~diez
tlf: +81-774-38-3296
fax: +81-774-38-3269
-----------------------------------------------
> Sorry for the long questions,
>
> Regards,
>
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/
> gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list