[BioC] Annotating HGU133plus2 genes with number of coding changes
Sean Davis
sdavis2 at mail.nih.gov
Thu Apr 19 15:43:23 CEST 2007
On Thursday 19 April 2007 09:33, marco zucchelli wrote:
> Hi Steffen,
>
> one more question: In the example i reported before seems like some probes
> are reported twice,
> i.e. 207893_at is listed 2 times matched to the same gene ID. Totally the
> "probes" vector contains the probes from hgu133plus2 (54675) but the query
> returns 66565 rows.
>
> I do not understand really the meaning of this ..
>
> Regards
>
> Marco
>
> probe.list <-
> getBM(attributes=c("ensembl_gene_id","affy_hg_u133_plus_2"),filters="affy_h
>g_u133_plus_2", values=probes, mart=mart)
>
> head(probes.list)
>
> ensembl_gene_id affy_hg_u133_plus_2
> 1 ENSG00000184895 207893_at
> 2 ENSG00000184895 207893_at
> 3 ENSG00000129824 201909_at
> 4 ENSG00000129824 201909_at
> 5 ENSG00000067646 207247_s_at
> 6 ENSG00000067646 207246_at
>
> On 4/3/07, Steffen Durinck <durincks at mail.nih.gov> wrote:
> > Hi Marco,
> >
> > It matches the transcripts and then maps those transcripts to the genes,
> > even if you don't include the transcript id in the query.
> > To see this you could set attributes =
> > c("ensembl_gene_id","ensembl_transcript_id","affy_hg_u133_plus_2") in
> > your query. Also if Ensembl didn't find a match for the affy probe then
> > it won't be included in the output and if they find multiple matches
> > then all of them will be returned.
Marco,
Try the suggestion that Steffen gave above (setting the attributes to include
the transcript). The mapping is NOT done to the gene, but to the transcript,
and there may be multiple transcripts for the same gene, each of which may be
mapped to one or more affy_ids.
Sean
More information about the Bioconductor
mailing list