[BioC] Limma: background correction. Use or ignore?

Wed Apr 12 13:50:01 CEST 2006

Quoting Henrik Bengtsson <hb at maths.lth.se>:

> A leading question: What do you mean by "MA plots look better"?  They
> can look better in many ways, depending on what you are trying to
> answer.  To simplify things very much, we have two cases of questions:
>
> 1) Find differentially expressed genes, that is, we are trying to test
> the null hypothesis H0: mu=0, against H1: mu != 0, where mu is the
> unknown log-ratio of the gene (in two samples).
>
> 2) Estimate the unknown log-ratio of the gene (in two sample), i.e.
> estimate mu.  This may for instance be of interest in copy number
> analysis.
>
> In Case 1, it does not matter much if our *absolute* value of the mu
> estimates are biased or not - we are still trying to identify those
> away from zero.  In other words, if we rescale the estimates we will,
> in theory, still be able to identify differentially expressed genes.
> This is what the variance stabilizing (VS) methods (Huber and Rocke &
> Durbin) is making use of.

This is my case. I am looking for genes that are differentially 
expressed. In fact, a lot of the times I am looking for genes that are 
NOT expressed (or have minimal expression) in either of the samples, 
and for these cases the values of M are irrelevant (I just care that 
they're high, in absolute terms).

When I said the MA plots looked better, I was referring to the general 
distribution of the genes, and the way that known genes were located in 
the plot. Multiple spots for a given gene often clustered better.

> Hopefully not being too self-oriented, I would like to refer to
> Bengtsson  & Hössjer, Methodological study of affine transformations
> of gene expression data with proposed robust non-parametric
> multi-dimensional normalization method BMCBioinfo, 2006, for more
> details.  I also have quite a few talks on the topic at
> http://www.maths.lth.se/bioinformatics/.  The VS papers address this
> too, but much less explicit.

Thanks for that. I will take a look!

> Another example is scanner bias.  We found that both Axon and Agilent
> scanners introduce a substantial offset in signals.  See Bengtsson et
> al, Calibration and assessment of channel-specific biases in
> microarray data with extended dynamical range, BMCBioinfo, 2004.  The
> offset in both scanners was/is about 20 units on the range [0, 65535].
>  It does not sound too much, but 20 is definitely enough to bias you
> log-ratios.  We have seen similar effects in Affymetrix scanners.
> Afterwards, we have identified some models of the same brands, that
> does not have such strong offset.  Thus, when we choose a scanner we
> introduce bias.  I'll reply in another message how to estimate and
> correct for this. It is easy.

when looking at my data, I have observed that while the foreground 
signals were pretty much comparable between slides scanned with an Axon 
scanner, or an ArraywoRx one (the latter using white light rather than 
lasers), the background was very different between the two.

As it turns out, teh Axon ones looked cleaner (using imageplot). In 
this case, I get the best stats if I do not correct for background 
(instead of substracting it). When the slides were scanned with the 
ArraywoRx scanner (higher background, and highly variable between 
channels), I get better results if I substract the background.

The list of genes is not very different. But the stats are.

Jose

-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK