[BioC] pd.hugene.1.0.st.v1

Vincent Carey stvjc at channing.harvard.edu
Fri Jul 31 14:10:36 CEST 2009


On Fri, Jul 31, 2009 at 12:48 AM, Mark Robinson<mrobinson at wehi.edu.au> wrote:
> Hi all.
>
> I wonder if its makes more sense to have the *transcript* version of this
> package, instead of the *probeset* version available when you install via:
>

This merits further discussion.  Note that under the current approach
you can obtain
the transcript cluster indices for summarization using fData on the
output of rma

> class(tismix)
[1] "GeneFeatureSet"
attr(,"package")
[1] "oligoClasses"
> class(tismixRMA)
[1] "ExpressionSet"
attr(,"package")
[1] "Biobase"
> fData(tismixRMA)[1:4,]
         fsetid  exon_id transcript_cluster_id level crosshyb_type chrom
7896737 7896737 96595542               7896736    NA             3     1
7896739 7896739 96595544               7896738    NA             3     1
7896741 7896741 96595546               7896740    NA             3     1
7896743 7896743 96595548               7896742    NA             3     1

                      accessions
7896737
                            <NA>
7896739
                            <NA>
7896741 BC136848,BC136907,ENST00000318050,ENST00000326183,ENST00000335137,NM_001
004195,NM_001005240,NM_001005484
7896743
        BC118988,ENST00000279067

> dim(fData(tismixRMA))
[1] 253002      7
> dim(exprs(tismixRMA))
[1] 253002     33

annotation packages are available at both the probescript and
transcript cluster level, thanks
to folks at city of hope (e.g.,
http://www.bioconductor.org/packages/release/data/annotation/html/hugene10sttranscriptcluster.db.html)


> source("http://bioconductor.org/biocLite.R")
> biocLite("pd.hugene.1.0.st.v1")
>
> It seems like as a default, more people would want gene-level summaries for
> these arrays ... especially since ~200k (~80%) of the probesets have 3
> probes or less.
>
> Of course I (and everyone around the world) could build this package locally
> from scratch using the transcript CSV, but it seems like there would be
> enough demand for this to make available direct from BioC.  Just a thought.
>  Does anyone agree?
>
> Or, am I missing something that will allow me to do gene-level analysis from
> this package?
>
> My session is below.
>
> Thanks in advance.
> Mark
>
>
>
> ----------------------
> mac1618:Desktop mrobinson$ wc -l HuGene-1_0-st-v1.na29.*.csv
>  257449 HuGene-1_0-st-v1.na29.hg18.probeset.csv
>   33317 HuGene-1_0-st-v1.na29.hg18.transcript.csv
> ----------------------
>
>
> ----------------------
>> library(oligo)
> Loading required package: oligoClasses
> Loading required package: Biobase
>
> Welcome to Bioconductor
>
>  Vignettes contain introductory material. To view, type
>  'openVignette()'. To cite Bioconductor, see
>  'citation("Biobase")' and for packages 'citation(pkgname)'.
>
> Loading required package: preprocessCore
> Welcome to oligo version 1.8.1
>> cf <- dir(celPath,"CEL")
>> fs <- read.celfiles( file.path(celPath,cf) )
> Loading required package: pd.hugene.1.0.st.v1
> Loading required package: RSQLite
> Loading required package: DBI
> Platform design info loaded.
> Reading in : rawData/cell_line/HuGene-1_0-st-v1//cancer1.CEL
> Reading in : rawData/cell_line/HuGene-1_0-st-v1//cancer2.CEL
> Reading in : rawData/cell_line/HuGene-1_0-st-v1//normal1.CEL
> Reading in : rawData/cell_line/HuGene-1_0-st-v1//normal2.CEL
>> rmaOligo <- oligo::rma(fs)
> Background correcting
> Normalizing
> Calculating Expression
> dmOligo <- exprs(rmaOligo)
> dim(rmaOligo)
>> dmOligo <- exprs(rmaOligo)
>> dim(rmaOligo)
> Features  Samples
>  253002        4
>> sessionInfo()
> R version 2.9.0 (2009-04-17)
> i386-apple-darwin8.11.1
>
> locale:
> en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] pd.hugene.1.0.st.v1_2.4.1 RSQLite_0.7-1
> [3] DBI_0.2-4                 oligo_1.8.1
> [5] preprocessCore_1.6.0      oligoClasses_1.6.0
> [7] Biobase_2.4.1
>
> loaded via a namespace (and not attached):
> [1] affxparser_1.15.6 affyio_1.12.0     Biostrings_2.12.1 IRanges_1.2.2
> [5] splines_2.9.0
> ----------------------
>
>
>
>
>
>
>
> ------------------------------
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robinson at garvan.org.au
> e: mrobinson at wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



-- 
Vincent Carey, PhD
Biostatistics, Channing Lab
617 525 2265



More information about the Bioconductor mailing list