[BioC] RMA/QuantileNormalization results difference between oligo and aroma.affymetrix for Hugene

Fri Feb 26 18:09:17 CET 2010

Ben and Benilton,

thanks for explicitly confirming this.  So, preprocessCore is what we
all use in the discussed implementation.

/Henrik

On Fri, Feb 26, 2010 at 5:07 PM,  <bmb at bmbolstad.com> wrote:
> As far as I aware, and Benilton can correct me if I am wrong, the "rma()"
> code in oligo basically does all the data structure specific manipulations
> relevant for oligo and then all the standard RMA processing is carried out
> using preprocessCore. That is the centralized location for it. The affy
> package also uses preprocessCore in a similar manner. So things should be
> synchronized between them.
>
> Ben
>
>> I rather leave bug fixes on the RMA "engine" for Ben Bolstad. Once he
>> provides a fix, it makes its way to preprocessCore and the C API used
>> by affy and oligo.
>>
>> b
>>
>> On Fri, Feb 26, 2010 at 3:30 PM, Henrik Bengtsson <hb at stat.berkeley.edu>
>> wrote:
>>> On Fri, Feb 26, 2010 at 2:29 PM, Benilton Carvalho
>>> <beniltoncarvalho at gmail.com> wrote:
>>>> regarding oligo's implementation of RMA, it's the same as the one in
>>>> affy.
>>>
>>> Just in case there are future bug fixes etc, is it based on the same
>>> code base or are you utilizing/calling the code base in the affy
>>> package?  If, say, oligo changes/fixes something in the RMA code, is
>>> there a risk that the code then will be different from affy?
>>>
>>> /Henrik
>>>
>>>>
>>>> b
>>>>
>>>> On Fri, Feb 26, 2010 at 12:47 PM, Henrik Bengtsson
>>>> <hb at stat.berkeley.edu> wrote:
>>>>> Hi,
>>>>>
>>>>> you concerns about reproducibility are very important.  Luckily your
>>>>> observations are based on mistakes as explained below.
>>>>>
>>>>> On Fri, Feb 26, 2010 at 12:05 PM, Benilton Carvalho
>>>>> <beniltoncarvalho at gmail.com> wrote:
>>>>>> Quantile normalization is already one step in the RMA workflow.
>>>>>> Therefore, there's no need to normalize the data again once you've
>>>>>> gone RMA, ie. (regarding oligo) your call
>>>>>> "normalize.quantiles(exprs(rmadata))" should be dropped.
>>>>>>
>>>>>> Using the defaults, rma() in oligo will:
>>>>>>
>>>>>> 1) Background correct (via the RMA convolution model)
>>>>>> 2) Quantile normalize
>>>>>> 3) Summarize via median-polish.
>>>>>
>>>>> Yes, as Benilton points out it seems like you've misunderstood what
>>>>> RMA does.  The author of RMA (Ben Bolstad) defines the term RMA to
>>>>> mean the complete preprocessing suite including summarization.
>>>>>
>>>>>>
>>>>>> b
>>>>>>
>>>>>> On Fri, Feb 26, 2010 at 10:46 AM, Mikhail Pachkov <pachkov at gmail.com>
>>>>>> wrote:
>>>>>>> Dear All,
>>>>>>>
>>>>>>> I am new in microarray analysis and need your expertise.
>>>>>>> I need to develop procedure for producing expression values from CEL
>>>>>>> files. Data should processed with RMA and quantile normalized. I
>>>>>>> have
>>>>>>> tried two packages - oligo and aroma.affymetrix. Obtained results
>>>>>>> are
>>>>>>> quite different form my point of view. Moreover
>>>>>>> aroma.affymetrix::QuantileNormalization function produce dta which
>>>>>>> do
>>>>>>> not look like they were quantile normalized.
>>>>>
>>>>> What is 'dta'?
>>>>>
>>>>>>>  I have made density plots of data after RMA and after quantile
>>>>>>> normalization which are available here
>>>>>>> http://test.swissregulon.unibas.ch/bioc/index.html There are also
>>>>>>> links to two CEL files I have used.
>>>>>>>
>>>>>>> I have a few questions:
>>>>>
>>>>> Below, I will take that you mean "RMA background correct" when you say
>>>>> "RMA".
>>>>>
>>>>>>> Why RMA results are so different?
>>>>>
>>>>> The RMA-style background correction in aroma.affymetrix utilizes
>>>>> affy::bg.adjust() [and normalizes PM probes only].  I'm not sure what
>>>>> algorithm/implementation oligo is using for this step, but they should
>>>>> give very similar corrected probe signals.
>>>>>
>>>>>>> Which RMA implementation is correct?
>>>>>
>>>>> So, aroma.affymetrix is basically just a wrapper for
>>>>> affy::bg.adjust(), which I think was the original implementation of
>>>>> RMA background correction.  I let Benilton comment on the oligo
>>>>> implementation and it's origin.
>>>>>
>>>>>>> Why does quantile normalization in aroma.affymetrics produce two
>>>>>>> different distributions?
>>>>>
>>>>> Because you first run quantile normalization on PMs only, then you
>>>>> look at the density plot for all (PMs & MMs).  More below.
>>>>>
>>>>>>>
>>>>>>> Thank you in advance!
>>>>>>>
>>>>>>> Here are R scripts I have used:
>>>>>>>
>>>>>>> ################################
>>>>>>> #aroma.affymetrix
>>>>>>> library(aroma.affymetrix);
>>>>>>> verbose <- Arguments$getVerbose(-8, timestamp=TRUE);
>>>>>>>
>>>>>>> # read files
>>>>>>> cdf <-
>>>>>>> AffymetrixCdfFile('annotationData/chipTypes/HuGene-1_0-st-v1/HuGene-1_0-st-v1.cdf');
>>>>>>> cs <- AffymetrixCelSet$byPath("rawData/mine/HuGene-1_0-st-v1/");
>>>>>
>>>>> Have to bring it up: Please, do not setup your annotation data and and
>>>>> data sets like this. An aroma.affymetrix script should not contain
>>>>> paths/pathnames, cf. "Dos and Don'ts":
>>>>>
>>>>>  http://aroma-project.org/node/102
>>>>>
>>>>> The correct way to do the above is:
>>>>>
>>>>> cs <- AffymetrixCelSet$byName("mine", chipType="HuGene-1_0-st-v1");
>>>>>
>>>>> alternatively, if you wish to be explicit in what CDF is used, you can
>>>>> do:
>>>>>
>>>>> cdf <- AffymetrixCdfFile$byChipType("HuGene-1_0-st-v1");
>>>>> cs <- AffymetrixCelSet$byName("mine", cdf=cdf);
>>>>>
>>>>>>>
>>>>>>> # RMA
>>>>>>> bc <- RmaBackgroundCorrection(cs);
>>>>>>> csBC <- process(bc,verbose=verbose);
>>>>>>>
>>>>>>> # QuantileNormalization
>>>>>>> qn <- QuantileNormalization(csBC, typesToUpdate="pm");
>>>>>>> csN <- process(qn);
>>>>>
>>>>> Note, the argument 'typesToUpdate' says that it is only PM probes that
>>>>> will be updated. The default is that MMs are left "as is".
>>>>>
>>>>>>>
>>>>>>> # Plots
>>>>>>> image_file <- ("aroma.affymetrix.RMA.png");
>>>>>>> png(image_file,width=1028,height=768);
>>>>>>> plotDensity(csBC);
>>>>>
>>>>> Here you are plotting all probes on the array.  Since
>>>>> RmaBackgroundCorrection is only correcting PM probes, you probably
>>>>> want to do:
>>>>>
>>>>> plotDensity(csBC, types="pm");
>>>>>
>>>>>>> title("aroma.affymetrix RMA data");
>>>>>>> dev.off();
>>>>>>>
>>>>>>> image_file <- ("aroma.affymetrix.QN.png");
>>>>>>> png(image_file,width=1028,height=768);
>>>>>>> plotDensity(csN);
>>>>>
>>>>> plotDensity(csN, types="pm");
>>>>>
>>>>> This is the key to why you get different density plots.  For a
>>>>> thorough explaination of the various QN approaches, see Page
>>>>> 'Empirical probe-signal densities and rank-based quantile
>>>>> normalization':
>>>>>
>>>>>  http://aroma-project.org/node/141
>>>>>
>>>>>>> title("aroma.affymetrix QN data");
>>>>>>> dev.off()
>>>>>
>>>>> What you haven't compared yet, because you misunderstood the RMA
>>>>> pipeline, are the summarized probe signals from fitting the
>>>>> log-additive RMA model.
>>>>>
>>>>> FYI, it is part of our (24 hours) redundancy testing to assert that
>>>>> the aroma.affymetrix RMA pipeline can reproduce the RMA pipeline and
>>>>> RMA summary estimates of the affyPLM package.  You can see how well
>>>>> this is done on Page 'Replication: RMA (background, normalization &
>>>>> summarization)':
>>>>>
>>>>>  http://www.aroma-project.org/replication/RMA
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> Henrik
>>>>>
>>>>>>> ################################
>>>>>>>
>>>>>>> ################################
>>>>>>> # oligo
>>>>>>> library(oligo);
>>>>>>> rawdata=read.celfiles(c("rawData/mine/HuGene-1_0-st-v1/sample1.CEL","rawData/mine/HuGene-1_0-st-v1/sample2.CEL"));
>>>>>>> rmadata=rma(rawdata);
>>>>>>> qndata=normalize.quantiles(exprs(rmadata))
>>>>>>>
>>>>>>> library(affy)
>>>>>>> # Plots
>>>>>>> image_file <- ("oligo.RMA.png");
>>>>>>> png(image_file,width=1028,height=768);
>>>>>>> plotDensity(exprs(rmadata));
>>>>>>> title("oligo RMA data");
>>>>>>> dev.off();
>>>>>>>
>>>>>>> image_file <- ("oligo.QN.png");
>>>>>>> png(image_file,width=1028,height=768);
>>>>>>> plotDensity(qndata);
>>>>>>> title("oligo QN data");
>>>>>>> dev.off()
>>>>>>> ###############################
>>>>>>>
>>>>>>> Kind regards,
>>>>>>>
>>>>>>> Mikhail Pachkov
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>