[BioC] Is GCRMA influenced by lexicographical order?

Mon Nov 8 17:13:58 CET 2010

On 11/8/2010 10:50 AM, Wolfgang Huber wrote:
> Hi Markus,
>
> Jean or Rafa will be able to give a more competent response, but as far
> as I can see from the code, there are some places where the first array
> is treated specially:
>
> ~/madman/Rpacks/gcrma/R$ grep ",1]" *
> gcrma.R: index2=which(!is.na(anc[,1]))
> gcrma.engine2.R: index.affinities <- which(!is.na(pm.affinities[,1]))

Those two "[,1] " in the code is actually "without loss of generality" 
reference. Affinities are missing if the probe sequence information is 
missing, and if it does it would be for all arrays. Thus by finding out 
which ones are missing in any array suffice -- and since there will 
always be at least one array, we did [,1] instead of [,10] or any other 
integer.

I will test it and see if I can reproduce your problem and see where the 
discrepancy could arise.

Best,
Jean

> Il Nov/8/10 10:19 AM, Markus Boenn ha scritto:
>>
>> Dear all,
>>
>> I've made some experiment about GCRMA using bg.adjust.gcrma() contained
>> in the gcrma package.
>>
>> I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of
>> the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I
>> rename both files in a way, which changes the lexicographical order,
>> say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II).
>>
>> Using bg.adjust.gcrma() and exprs() to obtain the expression values, for
>> some probe sets I get different values for A.cel (in CASE I) than for
>> BA.cel (in CASE II), although both files contain the same data. Of
>> course, for B.cel and AB.cel the same phenomenon becomes obvious.
>>
>> How can this be possible? To me, it seems as if GCRMA normalizes the
>> data with respect to the lexicographical order, but as far as I know,
>> GCRMA treats each array independently.
>>
>> This effect is not reproducible using call.exprs() with argument
>> algorithm="rma" or "mas5". But is reproducible for other arrays like
>> HGU95A, for example.
>>
>> Please, can anybody try to explain me, if this effect is a bug or a
>> feature (and why)?
>>
>> Best wishes,
>> Markus
>>
>>
>>
>> #################################################################
>>
>> sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>> other attached packages:
>> [1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0 [4]
>> simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0 [7] affy_1.28.0
>> Biobase_2.10.0
>> loaded via a namespace (and not attached):
>> [1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2 [4] RSQLite_0.9-2
>> affyio_1.18.0 annotate_1.28.0 [7] preprocessCore_1.12.0 splines_2.12.0
>> survival_2.35-8 [10] tools_2.12.0 xtable_1.5-6
>>
>> Also tried on other machine with similar packages
>>
>> sessionInfo()
>> R version 2.11.1 Patched (2010-09-16 r52943)
>> Platform: i686-pc-linux-gnu (32-bit)
>>
>> locale:
>> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
>> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8
>> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1
>> [5] Biobase_2.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2
>> [4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17
>> [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1
>> [10] survival_2.35-8 tools_2.11.1 xtable_1.5-6
>>
>>
>> #################################################################
>>
>> Code example
>>
>> setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel
>> CEL <- ReadAffy()
>> bag.1 <- bg.adjust.gcrma(CEL)
>> ebag.1 <- exprs(bag.1)
>>
>> setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel
>> CEL <- ReadAffy()
>> bag.2 <- bg.adjust.gcrma(CEL)
>> ebag.2 <- exprs(bag.2)
>>
>> par(mfrow=c(2,1))
>> # differences should be zero
>> plot(ebag.1[,1]-ebag.2[,2])
>> plot(ebag.1[,2]-ebag.2[,1])
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
------------------------------------
Zhijin (Jean) Wu
Assistant Professor of Biostatistics
Brown University, Box G-S121
Providence, RI  02912

Tel: 401 863 1230
Fax: 401 863 9182
http://www.stat.brown.edu/zwu