[BioC] ComBat: 3 adjustment variables & continuous adjustment variables
Magda Price [guest]
guest at bioconductor.org
Tue Mar 18 17:15:46 CET 2014
Hi!
I'm writing with a few questions about applying ComBat (sva package) to a set of ~180 samples run on the the Illumina Infinium HumanMethylation450 BeadChip array (~450,000 DNA methylation data points).
There is a large amount of variation in my data due to the plate the samples were run on (3 different plates), the chip they were run on (24 different chips) and the position they were located on the chip - specifically the row (6 different rows). The chips are set up in a 6 row * 2 column format like this:
sample 01 sample 02
sample 03 sample 04
sample 05 sample 06
sample 07 sample 08
sample 09 sample 10
sample 11 sample 12
I read Dr. Evan Johnson's suggestions to someone else with this "multiple-batch-effect-variable" problem in the ComBat google group (https://groups.google.com/forum/#!topic/combat-user-forum/PcTxNlaUmAI). He had 2 suggestions:
- Combine the two batch variables into one, if 3-4 reps are left in each batch
- Use ComBat multiple times, adjusting for the first batch using the other batch variables as covariates, and then adjust for the second batch, and so on
I cannot go with the first suggestion because combining the batch variables would create too many categories and I would not have enough replicates per batch category.
I am seeking advice on the points:
- The google group post is now a few years old, is it still thought that the step-wise correction is a valid approach?
- The google group post also was asking about adjusting for 2, not 3 batch variables, does this concern anyone more if I apply ComBat 3 times?
- Row would be better treated as a continuous adjustment variable than a factor. In the version of sva that I am using (3.0.2) I believe that only factor adjustment variables are supported. I have seen mention in a few forums that there might be an update to ComBat to adjust for a numeric batch variable, is one available?
Thank you in advanced for your help!
Magda Price, UBC
-- output of sessionInfo():
R version 2.14.0 (2011-10-31)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] sva_3.0.2 mgcv_1.7-22 corpcor_1.6.4 wateRmelon_1.2.2
[5] IlluminaHumanMethylation450k.db_1.4.6 org.Hs.eg.db_2.6.4 RSQLite_0.11.2 DBI_0.2-5
[9] AnnotationDbi_1.16.19 matrixStats_0.6.2 ROC_1.30.0 limma_3.10.3
[13] RColorBrewer_1.0-5 gplots_2.11.0 MASS_7.3-16 KernSmooth_2.23-6
[17] caTools_1.14 gdata_2.12.0 gtools_2.7.1 compare_0.2-3
[21] lattice_0.20-10 lumi_2.6.0 nleqslv_2.0 methylumi_2.0.13
[25] Biobase_2.14.0
loaded via a namespace (and not attached):
[1] affy_1.32.1 affyio_1.22.0 annotate_1.32.3 BiocInstaller_1.2.1 bitops_1.0-5 hdrcde_2.15 IRanges_1.12.6 Matrix_1.0-5
[9] nlme_3.1-108 preprocessCore_1.16.0 R.methodsS3_1.4.2 tools_2.14.0 xtable_1.7-1 zlibbioc_1.0.1
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list