[BioC] Can't reproduce assessment data included in affycomp package
Harris A. Jaffee
hj at jhu.edu
Wed Apr 20 00:35:20 CEST 2011
Whether or not rma() has changed over the years, it appears that the
read.newspikein() / assessSpikeIn2() pipeline has definitely changed.
I just re-processed Rafa's original file here, from 2004:
http://affycomp.jhsph.edu/AFFY2/rafa@jhu.edu/030429.1332/hgu133.csv
The result was at least similar to Philippe's and different from the
old one.
It didn't matter whether I let the Affycomp tool process the file or
if I did
it by hand.
For reference:
http://affycomp.jhsph.edu/AFFY2/rafa@jhu.edu/030429.1332/
http://affycomp.jhsph.edu/AFFY2/rafa@jhu.edu/030429.1332/hgu133.csv
http://affycomp.jhsph.edu/AFFY2/rafa@jhu.edu/030429.1332/spike-in-133-
assessment.pdf
http://affycomp.jhsph.edu/AFFY2/hj@jhu.edu/110419.1757/
http://affycomp.jhsph.edu/AFFY2/hj@jhu.edu/110419.1757/hgu133.csv
http://affycomp.jhsph.edu/AFFY2/hj@jhu.edu/110419.1757/spike-in-133-
assessment.pdf
The two Fig. 5c's are different, but what does this mean?
Begin forwarded message:
> From: "Harris A. Jaffee" <hj at jhu.edu>
> Date: April 19, 2011 3:53:39 PM EDT
> To: Philippe Serhal <philippe.serhal at umontreal.ca>
> Cc: Rafael Irizarry <rafa at jhu.edu>, Kasper Daniel Hansen
> <khansen at jhsph.edu>
> Subject: Re: [BioC] Can't reproduce assessment data included in
> affycomp package
>
> So we see differences by making this call:
>
> > affycomp.compfig5c(list(rma.assessment2.133$MA2, a$MA2))
>
> And something is certainly wrong, because these two things are not
> even the same size:
>
> > str(a$MA2$tp.low)
> num [1:630] 0.00159 0.00317 0.00476 0.00635 0.00794 ...
>
> > str(rma.assessment2.133$MA2$tp.low)
> num [1:504] 0.00198 0.00397 0.00595 0.00794 0.00992 ...
>
> If I did it right, your call to assessMA2() got 630 / 504 / 378 :
>
> $ fp.low : int [1:630] 0 0 0 0 0 0 0 0 0 0 ...
> $ tp.low : num [1:630] 0.00159 0.00317 0.00476 0.00635
> 0.00794 ...
> $ fp.med : int [1:504] 0 0 0 0 0 0 0 0 0 0 ...
> $ tp.med : num [1:504] 0.00198 0.00397 0.00595 0.00794
> 0.00992 ...
> $ fp.high : int [1:378] 0 0 0 0 0 0 0 0 0 0 ...
> $ tp.high : num [1:378] 0.00265 0.00529 0.00794 0.01058
> 0.01323 ...
> $ method.name: chr "RMA_testing"
>
> whereas Rafa had 504 / 378 / 378 :
>
> $ fp.low : num [1:504] 0 0 0 0 0 0 0 0 0 0 ...
> $ tp.low : num [1:504] 0.00198 0.00397 0.00595 0.00794
> 0.00992 ...
> $ fp.med : num [1:378] 0 0 0 0 0 0 0 0 0 0 ...
> $ tp.med : num [1:378] 0.00265 0.00529 0.00794 0.01058
> 0.01323 ...
> $ fp.high : num [1:378] 0 0 0 0 0 0 0 0 0 0 ...
> $ tp.high : num [1:378] 0.00265 0.00529 0.00794 0.01058
> 0.01323 ...
> $ method.name: chr "RMA"
>
> Are you getting what I have above, and does that suggest anything?
>
> > sessionInfo()
> R version 2.13.0 Patched (2011-04-19 r55517)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C
> [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915
> [5] LC_MONETARY=C LC_MESSAGES=en_US.iso885915
> [7] LC_PAPER=en_US.iso885915 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] splines stats graphics grDevices datasets utils
> methods
> [8] base
>
> other attached packages:
> [1] affycomp_1.27.2 affycompData_0.99.0 hgu133atagcdf_2.8.0
> [4] affy_1.30.0 Biobase_2.12.1
>
> loaded via a namespace (and not attached):
> [1] affyio_1.20.0 preprocessCore_1.14.0 tools_2.13.0
>
>
> On Apr 19, 2011, at 11:57 AM, Philippe Serhal wrote:
>> I can't quite seem to reproduce the precomputed MAS 5.0 and RMA
>> assessments provided in the affycomp package. Although I have
>> been unable to find the code that was originally used to generate
>> them, I think it is fairly safe to assume it looked a little
>> something like this (given a current working directory containing
>> the HG-U133A spike-in data [1]):
>>
>> R> library (affy)
>> R> library (affycomp)
>> R> data <- ReadAffy ()
>> R> eset <- rma (data)
>> R> write.table (data.frame (2 ^ exprs (eset), check.names = F),
>> + file = "tmp.csv", sep = ",", col.names = NA, quote = F)
>> R> tmp <- read.newspikein ("tmp.csv")
>> R> a <- assessSpikeIn2 (tmp, method.name = "RMA_testing")
>>
>> I then compare this with the precomputed assessments:
>>
>> R> data (rma.assessment2.133)
>> R> affycompPlot (rma.assessment2.133, a)
>>
>> Four of the six resulting figures [2] -- 1b, 2b, 4c, and 5e --
>> show identical results for the two assessments, while the other
>> two -- 5c and 5d -- show significantly diverging results.
>> Thinking perhaps the assessments had been computed using an older
>> version of RMA, I tried 'bgversion = 1' in the rma() call, but
>> that didn't help. In fact, this doesn't seem to have anything to
>> do with the RMA pipeline, because if I repeat the process
>> substituting 'mas5' for 'rma' and 'mas5.assessment2.133' for
>> 'rma.assessment2.133' (and not exponentiating), I obtain similar
>> results: 5c differs, but none of the others.
>>
>> Any idea what I'm doing wrong here? Thank you in advance for any
>> insight you may be able to provide.
>>
>> R version: 2.10.0
>> affy version: 1.24.1
>>
>> [1] HG-U133A spike-in data: http://www.biostat.jhsph.edu/~ririzarr/
>> affycomp/hgu133spikein.tgz
>> [2] affycomp figure definitions: http://affycomp.jhsph.edu/AFFY2/
>> comp_form.html#definitions
>>
>> --
>> Philippe Serhal
>> Functional and Structural Bioinformatics Lab
>> Institute for Research in Immunology and Cancer (IRIC)
>> Université de Montréal
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/
>> gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list