[BioC] ComBat: 2 adjustment variables & continuous adjustment variables
Magda Price
magdaprice at gmail.com
Wed Mar 19 23:04:47 CET 2014
Johnson, William Evan <wej at ...> writes:
>
> Hey Magda,
>
> The two-step method is still a reasonable approach. It has worked well for
me in multiple situations. I do
> have a beta version of a ComBat version that will handle two batch
variables at the same time. It works well
> in theory--but I have yet to test it thoroughly across multiple datasets.
I'm willing to share the code if
> you want to test it on your data (let me know).
>
> ComBat in the sva package can handle numeric covariates, but it does not
deal with continuous batch
> variables. Adjusting the mean of a continuous batch variable would be
straight-forward (assuming a
> linear effect), but the variance adjustment would be very tricky.
>
> Ultimately, since the two-step approach seems to have worked, I think your
best option is to just move
> forward with those results.
>
> Thanks!
>
> Evan
>
> On Feb 19, 2014, at 4:00 AM, <bioconductor-request at ...>
> <bioconductor-request at ...> wrote:
>
> > Message: 23
> > Date: Tue, 18 Feb 2014 16:45:12 -0800
> > From: Magda Price <magdaprice at ...>
> > To: "bioconductor at ..." <bioconductor at ...>
> > Subject: [BioC] ComBat: 2 adjustment variables & continuous adjustment
> > variables
> > Message-ID:
> > <CADkR4V=ydd1abJXFhtd+Xwq8MZMP_=urHVDtPXOTurPQjzB7Tg at ...>
> > Content-Type: text/plain
> >
> > Hi!
> >
> > I'm writing with a few questions about applying ComBat (sva package) to
a
> > set of ~50 samples run on the the Illumina Infinium HumanMethylation450
> > BeadChip array (~450,000 DNA methylation data points).
> >
> > There is a large amount of variation in my data due to both the batch
the
> > samples were run in (3 different batches), in addition to the position
they
> > were located on the chip - specifically the row (6 different rows), but
not
> > the column. The chips are set up in a 6 row * 2 column format like this:
> >
> >
> > sample 01 sample 02
> > sample 03 sample 04
> > sample 05 sample 06
> > sample 07 sample 08
> > sample 09 sample 10
> > sample 11 sample 12
> >
> >
> > I read Dr. Evan Johnson's suggestions to someone else with this
> > "2-batch-effect-variable" problem in the ComBat google group (
> > https://groups.google.com/forum/#!topic/combat-user-forum/PcTxNlaUmAI).
He
> > had 2 good suggestions:
> >
> > 1. Combine the two batch variables into one, if 3-4 reps are left in
> > each batch
> > 2. Use ComBat twice, adjusting for the first batch using the second
> > batch as a covariate, and then adjust for the second batch.
> >
> > I cannot go with the first suggestion because combining the 2 batch
> > variables would create 18 batch categories (3 batches * 6 rows), and I
> > would not have enough replicates per batch category.
> >
> > So I tried the second option - applying ComBat twice. I first corrected
for
> > row and then took the row-corrected data and applied ComBat again,
> > correcting for batch. It seems to have worked & the correlation of my
> > technical replicates improves. I am seeking advice on two points:
> >
> > 1. The google group post is now a few years old, is it still thought
> > that the step-wise correction is a valid approach?
> > 2. Row would be better treated as a continuous adjustment variable
than
> > a factor. In the version of sva that I am using (3.0.2) I believe that
only
> > factors adjustment variables are supported. I have seen mention in a
few
> > forums that there might be an update to ComBat to adjust for a numeric
> > batch variable, is one available?
> >
> > Thank you in advanced for your help!
> >
> > Magda Price,
> > University of British Columbia
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
Hi Evan,
Thanks for your response & so sorry for my delay, I wasn't notified by e-
mail that you had responded.
Since I wrote the first note, a few things have changed:
1. I have discovered a third batch variable (chip, in addition to plate &
row)!
2. Based on forum feedback, I figured I was okay to stick with all factor
adjustment variables.
An additional question that has come up in reference to your suggestion:
> > Use ComBat twice, adjusting for the first batch using the second
> > batch as a covariate, and then adjust for the second batch.
I can't include the second batch variable as a covariate; I get a
singularity error because the batches are confounded with each other. For
example, all samples on chip A were run in batch 1. Do you still think it a
valid approach if I can't use subsequent batch variables as covariates?
Thank you for the offer & advice!
Magda
More information about the Bioconductor
mailing list