[BioC] Can't reproduce assessment data included in affycomp package

Wed Apr 20 00:35:20 CEST 2011

Whether or not rma() has changed over the years, it appears that the
read.newspikein() / assessSpikeIn2() pipeline has definitely changed.

I just re-processed Rafa's original file here, from 2004:

	http://affycomp.jhsph.edu/AFFY2/rafa@jhu.edu/030429.1332/hgu133.csv

The result was at least similar to Philippe's and different from the  
old one.
It didn't matter whether I let the Affycomp tool process the file or  
if I did
it by hand.

For reference:

http://affycomp.jhsph.edu/AFFY2/rafa@jhu.edu/030429.1332/
http://affycomp.jhsph.edu/AFFY2/rafa@jhu.edu/030429.1332/hgu133.csv
http://affycomp.jhsph.edu/AFFY2/rafa@jhu.edu/030429.1332/spike-in-133- 
assessment.pdf

http://affycomp.jhsph.edu/AFFY2/hj@jhu.edu/110419.1757/
http://affycomp.jhsph.edu/AFFY2/hj@jhu.edu/110419.1757/hgu133.csv
http://affycomp.jhsph.edu/AFFY2/hj@jhu.edu/110419.1757/spike-in-133- 
assessment.pdf

The two Fig. 5c's are different, but what does this mean?

Begin forwarded message:
> From: "Harris A. Jaffee" <hj at jhu.edu>
> Date: April 19, 2011 3:53:39 PM EDT
> To: Philippe Serhal <philippe.serhal at umontreal.ca>
> Cc: Rafael Irizarry <rafa at jhu.edu>, Kasper Daniel Hansen  
> <khansen at jhsph.edu>
> Subject: Re: [BioC] Can't reproduce assessment data included in  
> affycomp package
>
> So we see differences by making this call:
>
> 	> affycomp.compfig5c(list(rma.assessment2.133$MA2, a$MA2))
>
> And something is certainly wrong, because these two things are not
> even the same size:
>
> > str(a$MA2$tp.low)
>  num [1:630] 0.00159 0.00317 0.00476 0.00635 0.00794 ...
>
> > str(rma.assessment2.133$MA2$tp.low)
>  num [1:504] 0.00198 0.00397 0.00595 0.00794 0.00992 ...
>
> If I did it right, your call to assessMA2() got  630 / 504 / 378 :
>
>  $ fp.low     : int [1:630] 0 0 0 0 0 0 0 0 0 0 ...
>  $ tp.low     : num [1:630] 0.00159 0.00317 0.00476 0.00635  
> 0.00794 ...
>  $ fp.med     : int [1:504] 0 0 0 0 0 0 0 0 0 0 ...
>  $ tp.med     : num [1:504] 0.00198 0.00397 0.00595 0.00794  
> 0.00992 ...
>  $ fp.high    : int [1:378] 0 0 0 0 0 0 0 0 0 0 ...
>  $ tp.high    : num [1:378] 0.00265 0.00529 0.00794 0.01058  
> 0.01323 ...
>  $ method.name: chr "RMA_testing"
>
> whereas Rafa had  504 / 378 / 378 :
>
>  $ fp.low     : num [1:504] 0 0 0 0 0 0 0 0 0 0 ...
>  $ tp.low     : num [1:504] 0.00198 0.00397 0.00595 0.00794  
> 0.00992 ...
>  $ fp.med     : num [1:378] 0 0 0 0 0 0 0 0 0 0 ...
>  $ tp.med     : num [1:378] 0.00265 0.00529 0.00794 0.01058  
> 0.01323 ...
>  $ fp.high    : num [1:378] 0 0 0 0 0 0 0 0 0 0 ...
>  $ tp.high    : num [1:378] 0.00265 0.00529 0.00794 0.01058  
> 0.01323 ...
>  $ method.name: chr "RMA"
>
> Are you getting what I have above, and does that suggest anything?
>
> > sessionInfo()
> R version 2.13.0 Patched (2011-04-19 r55517)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.iso885915       LC_NUMERIC=C
>  [3] LC_TIME=en_US.iso885915        LC_COLLATE=en_US.iso885915
>  [5] LC_MONETARY=C                  LC_MESSAGES=en_US.iso885915
>  [7] LC_PAPER=en_US.iso885915       LC_NAME=C
>  [9] LC_ADDRESS=C                   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] splines   stats     graphics  grDevices datasets  utils      
> methods
> [8] base
>
> other attached packages:
> [1] affycomp_1.27.2     affycompData_0.99.0 hgu133atagcdf_2.8.0
> [4] affy_1.30.0         Biobase_2.12.1
>
> loaded via a namespace (and not attached):
> [1] affyio_1.20.0         preprocessCore_1.14.0 tools_2.13.0
>
>
> On Apr 19, 2011, at 11:57 AM, Philippe Serhal wrote:
>> I can't quite seem to reproduce the precomputed MAS 5.0 and RMA  
>> assessments provided in the affycomp package.  Although I have  
>> been unable to find the code that was originally used to generate  
>> them, I think it is fairly safe to assume it looked a little  
>> something like this (given a current working directory containing  
>> the HG-U133A spike-in data [1]):
>>
>> R> library (affy)
>> R> library (affycomp)
>> R> data <- ReadAffy ()
>> R> eset <- rma (data)
>> R> write.table (data.frame (2 ^ exprs (eset), check.names = F),
>>  + file = "tmp.csv", sep = ",", col.names = NA, quote = F)
>> R> tmp <- read.newspikein ("tmp.csv")
>> R> a <- assessSpikeIn2 (tmp, method.name = "RMA_testing")
>>
>> I then compare this with the precomputed assessments:
>>
>> R> data (rma.assessment2.133)
>> R> affycompPlot (rma.assessment2.133, a)
>>
>> Four of the six resulting figures [2] -- 1b, 2b, 4c, and 5e --  
>> show identical results for the two assessments, while the other  
>> two -- 5c and 5d -- show significantly diverging results.   
>> Thinking perhaps the assessments had been computed using an older  
>> version of RMA, I tried 'bgversion = 1' in the rma() call, but  
>> that didn't help.  In fact, this doesn't seem to have anything to  
>> do with the RMA pipeline, because if I repeat the process  
>> substituting 'mas5' for 'rma' and 'mas5.assessment2.133' for  
>> 'rma.assessment2.133' (and not exponentiating), I obtain similar  
>> results: 5c differs, but none of the others.
>>
>> Any idea what I'm doing wrong here?  Thank you in advance for any  
>> insight you may be able to provide.
>>
>> R version: 2.10.0
>> affy version: 1.24.1
>>
>> [1] HG-U133A spike-in data: http://www.biostat.jhsph.edu/~ririzarr/ 
>> affycomp/hgu133spikein.tgz
>> [2] affycomp figure definitions: http://affycomp.jhsph.edu/AFFY2/ 
>> comp_form.html#definitions
>>
>> -- 
>> Philippe Serhal
>> Functional and Structural Bioinformatics Lab
>> Institute for Research in Immunology and Cancer (IRIC)
>> Université de Montréal
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/ 
>> gmane.science.biology.informatics.conductor
>