[BioC] error in hugene10sttranscriptcluster

Mon Apr 18 13:02:32 CEST 2011

Dear Jim,
Thanks for your quick reply.
I'm not sure if I understood your explanation. I did some research at 
the netaffx annotation as you suggested.

I've summarized using oligo package at the "core" level, i.e. transcript 
level. I've checked through NetAffx annotation files all the transcript 
clusters related to the NR_002716 gene.
These are: 7948894, 8019631,8019633,8019635,8019637,8019639, 8019641, 
8019703,8019705,8019707,8019709 and 8019802. What does this mean? A gene 
is made up of several clusters? Is the gene repeated through these 
clusters? Once I understand this issue, I will understand how limma 
works on these arrays, since I don't know whether cluster=gene or 
clusterS = gene.

I observed some differences between the annotated files 
HuGene-1_0-st_v1.na29.hg18.transcript (the one I used before) and 
HuGene-1_0-st_v1.na31.hg19.transcript (the latest one). The main 
differences are related to the "start" and "stop" fields on these files 
for each of the transcript clusters described above.
For the first version (na29.hg18), there are numbers different from zero 
on these fields, whereas in the latest version (na31.hg19), the "start" 
and "stop" values are zero. However, in both files, the "gene 
assignment" field is NR_002716. So, I don't understand why when I use 
mget("8019631",hugene10sttranscriptclusterACCNUM) and error is found 
whereas in the NetAffx annotation file this accession number exists.

Moreover, when using the annotation from oligo (which retrieves NetAffx 
Biological Annotation):
pData(featureData(OligoEset))["8019631","geneassignment"]
returns NR_002716

I'm a little bit confused about this.
Thanks again,
Javier

On 16/04/2011 20:40, James MacDonald wrote:
> Hi Javier,
>
> The annotation of Affy chips tends to change over time, and this might
> be an instance of that. If you check netaffx for this probeset, the
> transcript it measures is described as 'multiple', and if you blat the
> sequence they built the probeset against, it matches all over the place.
> So it may be that in the past they claimed a direct match and now they
> don't.
>
> You could investigate this further by looking at older versions of the
> annotation files if you care to know more.
>
> Best,
>
> Jim
>
>
>
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
>>>> Javier Pérez Florido 04/16/11 8:16 AM>>>
> Dear list,
> I'm trying to get the ENTREZIDs of some Affy_IDs of GeneChip Human Gene
> ST 1.0 Arrays through hugene10sttranscriptcluster package. Depending on
> the R version, the results are different.
> For example, in R 2.12.2:
>   >  mget("8104901",hugene10sttranscriptclusterENTREZID)
> $`8104901`
> [1] "3575"
>
> But
>   >mget("8019631",hugene10sttranscriptclusterENTREZID)
> Error en .checkKeys(value, Lkeys(x), x at ifnotfound) :
>     value for "8019631" not found
>
> The sessionInfo is:
>
> sessionInfo()
> R version 2.12.2 (2011-02-25)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
> [5] LC_TIME=Spanish_Spain.1252
>
> attached base packages:
>    [1] grid      tools     tcltk     stats     graphics  grDevices utils
>    [8] datasets  methods   base
>
> other attached packages:
>    [1] annotate_1.28.1                      oneChannelGUI_1.16.5
>    [3] girafe_1.2.0                         genomeIntervals_1.6.0
>    [5] intervals_0.13.3                     ShortRead_1.8.2
>    [7] lattice_0.19-17                      Rsamtools_1.2.3
>    [9] Biostrings_2.18.4                    GenomicRanges_1.2.3
> [11] baySeq_1.4.0                         edgeR_2.0.5
> [13] IRanges_1.8.9                        preprocessCore_1.12.0
> [15] GOstats_2.16.0                       graph_1.28.0
> [17] Category_2.16.1                      tkWidgets_1.28.0
> [19] DynDoc_1.28.0                        widgetTools_1.28.0
> [21] affylmGUI_1.24.0                     affyio_1.18.0
> [23] affy_1.28.0                          limma_3.6.9
> [25] hugene10sttranscriptcluster.db_6.0.1 org.Hs.eg.db_2.4.6
> [27] RSQLite_0.9-4                        DBI_0.2-5
> [29] AnnotationDbi_1.12.0                 Biobase_2.10.0
>
> loaded via a namespace (and not attached):
>    [1] BSgenome_1.18.3   genefilter_1.32.0 GO.db_2.4.5
> GSEABase_1.12.2
>    [5] hwriter_1.3       RBGL_1.26.0       splines_2.12.2
> survival_2.36-5
>    [9] XML_3.2-0.2       xtable_1.5-6
>
> However, in R 2.10.0
>    mget("8104901",hugene10sttranscriptclusterENTREZID)
> $`8104901`
> [1] "3575" (the same as before in R 2.12.2)
>
>   >  mget("8019631",hugene10sttranscriptclusterENTREZID)
> $`8019631`
> [1] "6066" (there is no error like in R 2.12.2)
>
> The sessionInfo is:
> R version 2.10.0 (2009-10-26)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
> [5] LC_TIME=Spanish_Spain.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] limma_3.2.3
> hugene10sttranscriptcluster.db_4.0.1
> [3] org.Hs.eg.db_2.3.6                   RSQLite_0.9-2
> [5] DBI_0.2-5                            AnnotationDbi_1.8.2
> [7] Biobase_2.6.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.10.0
>
> Why this error for Affy_ID 8019631 when R2.12.2 is used?
> Thanks,
> Javier
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
>
>