[BioC] ComBat: 3 adjustment variables & continuous adjustment variables
James W. MacDonald
jmacdon at uw.edu
Wed Mar 19 14:58:03 CET 2014
Hi Magda,
I'm not sure you need to do things sequentially like that. From what I
can tell, you should just be able to do
mod <- model.matrix(~tissue, des)
bat <- ComBat(data, des[,c("plate","row","chip")], mod)
And go from there.
Best,
Jim
On 3/18/2014 6:04 PM, Magda Price wrote:
> Hi Jim,
>
> Re numCovs - what you've stated was how I interpreted the use as well,
> which is why I didn't think it would helpful.
>
> As usual with these types of human disease datasets, the study design
> is not ideal, and more complicated than I initially let on! The 180
> samples are a combination of 3 phenotype groups (1 control + 2
> diseased) and 5 different tissues. Other samples, unrelated to this
> project were also run on these chips, which is why I'm working with
> less samples than the total that were run (which was 288).
>
> Here's a simplified version of what my ComBat code looks like:
>
> #1 - correct for plate effect
> mod.1<- model.matrix(~tissue+group+row+chip, data=des)
> bat.1<- ComBat(data, des$plate, mod.1)
>
> #2 - correct for row effect
> mod.2<-model.matrix(~tissue+group+chip, data=des)
> bat.2<-ComBat(data=bat.1, des$row, mod.2)
>
> #3 - correct for chip
> mod.3<-model.matrix(~tissue+group,data=des)
> bat.3<-ComBat(data=bat.2, des$chip,mod.3)
> We know from some pilot studies that the effect size (i.e.
> differential methylation between disease vs control samples in a give
> tissue) is small, so I am concerned about being thorough in the batch
> correction. I'm new to batch correction and you've correctly
> understood my concern about the row effect; so it sounds to me that
> how I have modeled the effect in the code above (i.e. each batch
> variable as a factor) was correct. Any corrections/suggestions for
> what I've done above?
>
> Thanks!
>
>
> On Tue, Mar 18, 2014 at 2:27 PM, James W. MacDonald <jmacdon at uw.edu
> <mailto:jmacdon at uw.edu>> wrote:
>
> Hi Magda,
>
> The numCovs argument won't work because that is simply used to
> specify columns in the model matrix (of non-batch things you want
> to fit in your linear model) that are continuous covariates rather
> than fixed effects. It has nothing to do with correcting for the
> batch effect.
>
> And I think you might be thinking about batch effects in the wrong
> way. If you fit a 'row' effect, then what you are saying is that
> on average, the measures you get from one row differ from the
> measures you get from another row. So as an example, row 1 might
> tend to have higher values because those arrays don't get washed
> as well, whereas rows 3 and 4 might be dimmer because they get
> washed more. You then want to estimate how much brighter on
> average, the row1 chips are (and how much dimmer the row 3 and 4
> chips are), and adjust the observed data to account for this.
>
> But you do the estimation of these averages using factors, rather
> than continuous measures (because a chip either is or is not in
> row 1).
>
> You might just be over-thinking this. I don't see how 3 plates of
> 24 chips gets you to 180 samples, but regardless it seems like you
> have enough replication to estimate the batch effects, and still
> have enough degrees of freedom left over for your comparisons,
> unless you have some huge number of phenotypic combinations that
> you are trying to compare (do you?).
>
> Best,
>
> Jim
>
>
>
>
> On Tuesday, March 18, 2014 2:13:11 PM, Magda Price wrote:
>
> Hi Jim,
>
> I have several different "batch" variables - one for example
> is the
> chip that each sample was run on (there are 24 of these) and I
> think
> chip batch should definitely be treated as a factor. Another
> "batch"
> variable I would like to adjust for is the position the sample
> was run
> on the chip (there are 6 different rows). If I use row as a
> factor,
> then the effect of being in row 1 vs 2 is treated the same as the
> effect of 1 vs 6, but the bias I see changes step-wise from
> row 1, 2,
> 3, 4, 5, 6 thus I thought that treating row as a numeric or
> integer
> variable would better model the "batch" effect. In other words row
> batches have meaning relative to each other whereas chip
> batches do not.
>
> I guess this would be another reason why using the numCovs option
> (continuous not integer) might not work in my case?!
>
> Hope that explains things a bit better! Happy to provide any
> more info
> & I really appreciate the input.
>
> Magda
>
>
> On Tue, Mar 18, 2014 at 10:51 AM, James W. MacDonald
> <jmacdon at uw.edu <mailto:jmacdon at uw.edu>
> <mailto:jmacdon at uw.edu <mailto:jmacdon at uw.edu>>> wrote:
>
> Hi Magda,
>
> I'm curious. How can one specify a batch using a continuous
> variable? In other words, isn't a particular sample in a
> batch or not?
>
> Best,
>
> Jim
>
>
>
> On 3/18/2014 1:44 PM, Magda Price wrote:
>
> Hi Steve,
>
> Thanks for your advice. I do know that I'm using an old
> version of R (one
> of the packages I'm using requires it) however, the
> options
> you mention
> from sva are in fact available in the older version as
> well,
> but it wasn't
> clear to me how to use them.
>
> I've copied the usage and argument information for the
> ComBat
> function
> below, maybe you can help clarify:
>
> *ComBat(dat, batch, mod, numCovs=NULL,
> par.prior=TRUE,prior.plots=__FALSE)*
>
>
> *dat Genomic measure matrix (dimensions probe x
> sample) - for
> example,
> expression matrix*
>
> *batch Batch covariate (multiple batches allowed)*
>
> *mod Model matrix for outcome of interest and other
> covariates
> besides
> batch*
>
> *numCovs (Optional) Vector containing the column
> numbers of
> the continuous
>
> covariates in the model matrix, or NULL if no continuous
> covariates are
> used*
>
> *par.prior (Optional) TRUE indicates parametric
> adjustments
> will be used,
> FALSE indicates non-parametric adjustments will be used*
> *prior.plots (Optional) TRUE give prior plots with
> black as a
> kernel
>
> estimate of the empirical batch effect density and red
> as the
> parametric
> estimate*
>
>
> The model matrix is supposed to contain the outcome of
> interest and other
> covariates *besides batch*, but batch is what I need
> to be a
> continuous
> variable. numCovs seems to allow me to specify
> *covariates*
> that should be
> continuous, but not *adjustment variables*. What am I
> missing?
>
>
> Thanks again!
>
>
>
> On Tue, Mar 18, 2014 at 9:48 AM, Steve Lianoglou
> <lianoglou.steve at gene.com
> <mailto:lianoglou.steve at gene.com>
> <mailto:lianoglou.steve at gene.com
> <mailto:lianoglou.steve at gene.com>>>__wrote:
>
>
> Hi Magda,
>
> You are using a version of R (2.14) that is
> horribly out
> of date, and
> as a result your bioconductor packages are frozen to
> versions that are
> quite old.
>
> Please update to the latest version of R (3.0.3) and
> reinstall your
> bioconductor packages using biocLite to ensure
> that you
> are running
> the the latest version of them.
>
> The package you are version (sva v3.0.2) is now at
> version
> 3.8.0.
>
> One question you asked:
>
> - Row would be better treated as a continuous
> adjustment variable than a
>
> factor. In the version of sva that I am using
> (3.0.2) I
> believe that only
> factor adjustment variables are supported. I have seen
> mention in a few
> forums that there might be an update to ComBat to
> adjust
> for a numeric
> batch variable, is one available?
>
> Is readily answered by reading through the
> vignette for
> the current
> version of the package:
>
>
> http://bioconductor.org/__packages/release/bioc/__vignettes/sva/inst/doc/sva.pdf
>
>
>
> <http://bioconductor.org/packages/release/bioc/vignettes/sva/inst/doc/sva.pdf>
>
> Specifically in Section 7 (Applying the ComBat
> function to
> adjust for
> known batches), where it states:
>
> """
> By default, all adjustment variables will be
> treated as factor
> variables by the ComBat function. If you would
> like to include
> continuous adjustment variables, also create a vector
> containing the
> column numbers of the continuous covariates in the
> model
> matrix. This
> vector must then be input into ComBat via the
> numCovs option.
> """
>
> HTH,
>
> -steve
>
> --
> Steve Lianoglou
> Computational Biologist
> Genentech
>
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
>
>
> --
> E. Magda Price
> PhD Candidate, Robinson Lab
> University of British Columbia
>
> CFRI Room 2071
> 950 West 28th Ave.
> Vancouver BC., V5Z 4H4
> (604)-875-3015 <tel:%28604%29-875-3015>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
>
>
> --
> E. Magda Price
> PhD Candidate, Robinson Lab
> University of British Columbia
>
> CFRI Room 2071
> 950 West 28th Ave.
> Vancouver BC., V5Z 4H4
> (604)-875-3015
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list