[BioC] maanova background correction

Thu Jun 26 11:06:01 MEST 2003

Dear David,

At 02:16 AM 26/06/2003, Dave Waddell wrote:
>This seems to me to be a very important point that has not been
>adequately covered. In the paper discussed below (and available here
>http://www.stat.berkeley.edu/users/terry/zarray/TechReport/584.pdf) the
>following paragraph discusses the problem:

>"The motivation behind background adjustment is the belief that a spot's
>measured intensity includes a contribution not specifically due to the
>hybridization of the target to the probe, but to something else, for
>example, non-specific hybridization and other chemicals on the glass. We
>would like to measure this contribution and subtract it in order to
>obtain a more accurate quantization of hybridization. The glass slides
>are treated chemically so that the spotted cDNA fragments will bind to
>them. After the cDNA spots are printed, the slides are treated again so
>that target DNA does not bind to them. Nevertheless, some binding of the
>target to the slide may occur. Furthermore, there may be some
>fluorescence away from the spots due to the slide's surface treatment
>and the glass. It seems likely that the fluorescence from regions of the
>slide not occupied by DNA is different from that from regions occupied
>by DNA. It follows that measuring the intensity in some region near a
>spot and subtracting it may not be the best way to correct for this
>extra contribution. It would be interesting to compare morphological and
>local background estimates to estimates based on negative controls (i.e.
>spotted DNA sequences which should have no hybridization signal)."
>
>This paper's main concern is with the determination of background but
>James' concern (and mine too) is that background may not be related in
>any way (or at least in any way we can measure) to regions occupied by
>DNA. Background subtraction is clearly not the thing to do otherwise it
>would be impossible to get "black holes" since nothing could be less
>than background.

The points that you make are valid concerns. With some background 
estimators you might well be better off ignoring the background if you only 
want to test for differential expression and are not going to use the 
measured log-ratios as meaningful estimates of the true fold change.

You should be aware though that the background estimates generated by 
different image analysis programs are not the same. In particular the 
background estimation method recommended by Yang et al gives systematically 
lower values than the one that you are likely using. SPOT morph background 
estimation does not give the phenomenom of "black holes" that you mention, 
even on arrays specifically designed with negative controls to generate 
that phenomenom.

The best way to do background estimation is an active research area and I 
am working on it with a student at the moment. It is too early to make 
research reports available but I am certainly convinced that some form of 
background correction is a good idea.

Hope this helps
Gordon

>Dave.
>
>-----Original Message-----
>From: bioconductor-bounces at stat.math.ethz.ch
>[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Rafael A.
>Irizarry
>Sent: Wednesday, June 25, 2003 8:35 AM
>To: James MacDonald
>Cc: bioconductor at stat.math.ethz.ch
>Subject: Re: [BioC] maanova background correction
>
>On Wed, 25 Jun 2003, James MacDonald wrote:
>
> > This is one of many options, and maybe it is a good idea. My main
>worry
> > about background is that you are assuming that the non-specific
>binding
> > of cDNA to the area just outside a spot is equal to the non-specific
> > binding within the spot.
>
>you are also assuming that whatever statistical procedure
>was used to capture the intensity around the spot is appropriate.
>different imaging packages give different result..
>you are also assuming that subtracting is the correct thing to do.
>
>i believe this paper has a discussion on some of this:
>
>Y. H. Yang, M. J. Buckley, S. Dudoit, and T. P. Speed (2002). Comparison
>
>of methods for image analysis on cDNA microarray data. Journal of
>Computational and Graphical Statistics, Vol. 11, No. 1, pp. 108--136.
>  >
> > All of the spotted arrays we use in our core have negative controls
> > (salmon sperm cDNA, cot-1, A. thaliana, etc). On the odd occasion that
>a
> > slide has a huge smear of background fluorescence going across the
> > slide, it is invariably true that the negative controls are 'black
> > holes' in the middle of the background. This implies to me that the
> > non-specific binding within a spot of cDNA is quite different than to
> > the remainder of the slide.
> >
> > Because of this observation, I am reluctant to assume that the current
> > method of estimating background gives an unbiased estimate.
> >
> > It might be interesting to do a background estimate like the one used
> > in rma, where the background is estimated from those spots with no
> > apparent binding. However, this would require a relatively large
>array.
> >
> > Jim
> >
> >
> >
> > James W. MacDonald
> > UMCCC Microarray Core Facility
> > 1500 E. Medical Center Drive
> > 7410 CCGC
> > Ann Arbor MI 48109
> > 734-647-5623
> >
> > >>> <kfbargad at lg.ehu.es> 06/25/03 05:04AM >>>
> > I agree that subtracting background can add variability to your
> > dataset, but I think that if you don t subtract it you risk having
> > spots with a signal value composed of its real signal value plus a
> > high background signal value. What do you think about prefiltering for
> >
> > those spots with an ubnormal high background value and then doing your
> >
> > analysis? Could this be an option?
> >
> > David
> >
> > > I think you have come across a relatively contentious issue, and I
> > doubt
> > > you will be able to get a consensus about background subtraction.
> > > Additionally, each software/scanner uses a different method of
> > > estimating background, so the usefulness of the background is
> > largely
> > > dependent on how it was estimated.
> > >
> > > Personally, I look at background subtraction the same way I look at
> > the
> > > MM probes on an Affy chip. I am sure there is a reasonable way to
> > use
> > > these data, but I am not too sure that simply subtracting background
> > > from foreground is a good idea. For instance, background is usually
> > > estimated from portions of the slide that are blocked with something
> > > other than cDNA. Anybody that has ever looked at a slide with
> > negative
> > > control cDNA spots can tell you that the intensity of the negative
> > > control is almost always much smaller than background. In my
> > opinion,
> > > this indicates that the estimated background almost always
> > overestimates
> > > true background.
> > >
> > > In addition, variability is additive, so if you subtract background
> > > from foreground, you are adding the variability of your background
> > > estimate to your new foreground estimate. Considering the inherent
> > > variability of microarray data, this cannot be considered a good
> > thing.
> > >
> > > On the other hand, if you don't subtract background (or some ad hoc
> > > estimate thereof), your data will be (possibly) upwardly biased.
> > >
> > > So here is what I do; I simply use the raw signal and accept that
> > the
> > > data may be biased. This is certainly not the ideal situation, but I
> > > think it is a reasonable trade off of bias for (hopefully) better
> > > precision.
> > >
> > > HTH,
> > >
> > > Jim
> > >
> > >
> > >
> > > James W. MacDonald
> > > UMCCC Microarray Core Facility
> > > 1500 E. Medical Center Drive
> > > 7410 CCGC
> > > Ann Arbor MI 48109
> > > 734-647-5623
> > >
> > > >>> "Brendan M. Heavey" <bmheavey at buffalo.edu> 06/24/03 02:56PM >>>
> > > Hello-
> > >
> > > I am using MAANOVA to analyze cDNA chips.  Does anybody know how to
> > > deal with background spot intensity?
> > >
> > > Right now, I have about 4000 genes on an array, each spotted 3
> > times.
> > >
> > > I can input the raw signal strength for each of the 12,000 spots and
> >
> > > run analysis on those.
> > >
> > > I would like to subtract background intensity from each of the
> > spots,
> > > but this leads to negative values in some spots (that haven't
> > > hybridized).  Maanova seems to not like negatives or missing values,
> >
> > > which means I have to eliminate all 3 spots for each gene that
> > produces
> > >
> > > a single negative...which reduces my dataset to a pitiful number of
> > > genes.
> > >
> > > I've considered:
> > > 1). Replacing the missing/negative value with a number very close to
> >
> > > zero
> > > 2). Replacing the missing/negative value with the average of the
> > other
> > >
> > > two spots
> > > 3). Forgetting about background intensity completely and just using
> > raw
> > >
> > > signal strength
> > >
> > > ...but none of them seem like the right thing to do
> > >
> > > any ideas?
> > >
> > > thanks in advance
> > >
> > > Brendan Heavey
> > > Analyst Programmer
> > > Center for Research in Cardiovascular Medicine
> > > University at Buffalo
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at stat.math.ethz.ch
> > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at stat.math.ethz.ch
> > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> > >
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> >
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor