[BioC] what is the correct method to use AnnotatedDataFrame?

Sat May 12 20:29:11 CEST 2007

Daofeng,

The answer to the question in your subject line is in the help page
for AnnotatedDataFrame, accessible with any of the help commands

> library(Biobase)
> ?AnnotatedDataFrame
> class?AnnotatedDataFrame
> help("AnnotatedDataFrame")

The initial _errors_ you receive below are from typing mistakes, e.g.,
'spikenin95' rather than 'spikein95'. Probably in asking for help on
the list you'll want to create a clean transcript that contains just
the problem, and not additional information.

The initial _warning_ is more difficult -- Bioconductor changes, and
some of the data objects in use at the time the monograph was produced
are being 'Deprecated' (marked as no longer supported), so the
monograph is no longer a completely accurate guide.

Trying to adapt the monograph code to work with current data objects
is challenging. For instance, spikein95 has the 'old' definition of
AffyBatch. AnnotatedDataFrame is the new version of phenoData. The
underlying code for assigning phenoData to an AffyBatch object
reflects the new objects, assigning AnnotatedDataFrame objects to the
appropriate place in AffyBatch objects -- it makes no sense for an old
phenoData object to be assigned to a new AffyBatch object, or to
assign a new AnnotatedDataFrame to an old AffyBatch.

One possibility it to use the version of R and Bioconductor that the
monograph was originally processed with. This is probably not a good
idea, since a benefit of Bioconductor is that it has changed to
address current challenges.

Another possibility is to update the data to its current definition, and
use the AnnotatedDataFrame object in place of the phenoData object:

> spikein2007 <- updateObject(spikein95)
> spikein2007 <- updateObject(spikein95)
removing 'se.exprs' with all NA values
> phenoData(spikein2007) <- new("AnnotatedDataFrame",
+  data=data.frame())

Notice that 'phenoData<-' here is a method ('replace the phenotypic
data for spikein2007 with the new AnnotatedDataFrame object'), and
AnnotatedDataFrame an object.

I don't know how productive this is for learning Bioconductor
-- you'll learn the original data objects and methods, the new data
objects and method, and how to update from old to new. You won't learn
about any methods developed since the monograph was published. And
really all you want is to learn the current data and methods.

My suggestion (others probably have better ideas) is to use the
monograph to get a broad picture of how Bioconductor works and the
sorts of tasks it is good for. Then visit the 'Browse Packages' link
on bioconductor.org and identify packages, referenced in the monograph
or other, that perform tasks you want to do. From here the strategy is
to use the package vignettes and help pages.

Finally, already you are learning the essential Bioconductor -- how to
install R and packages, how to interact with the R evaluator, where to
find help, what general facilities are available, how to write your
own analysis work flows, ..., and of course that to stay current with
changing biological questions and data leads to changing software.

Hope that helps

Martin

"Daofeng.Li" <lidaof at gmail.com> writes:

> Hi Dear list,
>
> i am a newbie who just start the learning of BioC
> i installed the getMonograph and follow the steps of the book
> "Bioinformatics and Computational Biology Solutions Using R and
> Bioconductor"
>
> i encountered a problem when do the following step by step exercise:
>
> and can someone tell me a little better way to study how to use BioConductor
> to analysis microarray data?
> Thanks!
>
>
>  R
>
> R version 2.5.0 (2007-04-23)
> Copyright (C) 2007 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
>   Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
>> library(affy)
> Loading required package: Biobase
> Loading required package: tools
>
> Welcome to Bioconductor
>
>   Vignettes contain introductory material. To view, type
>   'openVignette()'. To cite Bioconductor, see
>   'citation("Biobase")' and for packages 'citation(pkgname)'.
>
> Loading required package: affyio
>> library("SpikeInSubset")
>> data(spikein95)
>> pd <- data.frame(population = c(1,1,1,2,2,2),replicate = c(1,2,3,1,2,3))
>> rownames(pd) <- sampleNames(spikein95)
>> vl <- list(population = "1 is control, 2 is treatment", replicate =
> c(arbitraty numbering))
> Error: syntax error, unexpected SYMBOL, expecting ',' in "vl <-
> list(population = "1 is control, 2 is treatment", replicate = c(arbitraty
> numbering"
>> vl <- list(population = "1 is control, 2 is treatment", replicate =
> "arbitraty numbering")
>> phenoData(spikenin95) <- new("phenoData", pData = pd, varLabels = vl)
> Error in phenoData(spikenin95) <- new("phenoData", pData = pd, varLabels =
> vl) :
>         object "spikenin95" not found
> In addition: Warning message:
> The phenoData class is deprecated, use AnnotatedDataFrame (with
> ExpressionSet) instead
>> AnnotatedDataFrame(spikenin95) <- new("AnnotatedDataFrame", pData = pd,
> varLabels = vl)
> Error in .nextMethod(.Object, ...) : invalid names for slots of class
> "AnnotatedDataFrame": pData, varLabels
>> AnnotatedDataFrame(spikenin95) <- new("phenoData", pData = pd, varLabels =
> vl)
> Error in AnnotatedDataFrame(spikenin95) <- new("phenoData", pData = pd,  :
>         object "spikenin95" not found
> In addition: Warning message:
> The phenoData class is deprecated, use AnnotatedDataFrame (with
> ExpressionSet) instead
>> AnnotatedDataFrame(spikenin95) <- new("ExpressionSet", pData = pd,
> varLabels = vl)
> Error in function (storage.mode = c("lockedEnvironment", "environment",  :
>         'AssayData' elements with invalid dimensions: 'varLabels'
>> ls ()
> [1] "pd"        "spikein95" "vl"
>
>> sessionInfo()
> R version 2.5.0 (2007-04-23)
> i686-pc-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] "tools"     "stats"     "graphics"  "grDevices" "utils"     "datasets"
> [7] "methods"   "base"
>
> other attached packages:
> SpikeInSubset          affy        affyio       Biobase
>       "1.2.4"      "1.14.0"       "1.4.0"      "1.14.0"
>> capabilities()
>     jpeg      png    tcltk      X11 http/ftp  sockets   libxml     fifo
>     TRUE     TRUE     TRUE     TRUE     TRUE     TRUE     TRUE     TRUE
>   cledit    iconv      NLS  profmem
>     TRUE     TRUE     TRUE    FALSE
> -- 
> Daofeng.Li
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org