[BioC] Voom and removing batch effect

Heather Estrella hestrella at regulusrx.com
Tue Jan 21 20:42:29 CET 2014


Hi,

I have a miRNA-Seq dataset with clear batch effect that I'm trying to correct for in order to run differential expression using voom/limma. Based on the Limma User Guide, this can be done by adding batch as a confounding factor to the design matrix used by voom and limma fit. However, when I run an MDS plot on batch using the voom results, it's still showing a clear batch effect. What am I missing? What is the best way to correct for and test for batch effect after correction?

In the design matrix, the "dz_cat" group to compare all the other groups does not appear in the design matrix. My understanding from reading other posting by Gordom Smyth is that it's because it is used as the comparison group to all the other groups. I'm not sure why it is all 1's though.

FYI: I am able to correct for batch effect using the cpm-log2 values and ComBat. However, for differential expression using voom/limma I need to use raw counts for the normalization and differential comparison.

Here is my code:

>    library(edgeR)
>    library(limma)
>    y = DGEList(counts=datamat,genes=rownames(datamat))
>    y = calcNormFactors(y)
>    design <- model.matrix(~batch+dz_cat, data = samplemat)
>    png(file=paste("../results/MV_plot_voom_",name,".png",sep=""),pointsize=11,bg="white")
>       v <- voom(counts=y, design,plot=TRUE)
>    dev.off()
>    colrs = rainbow(length(unique(samplemat$batch)))
>    mycol=colrs[samplemat$batch]
>    o = which(is.na(mycol))
>   png(height=600,width=600,file=paste("../results/MDS_plot_voom_",name,".png",sep=""),pointsize=11,bg="white")
>         plotMDS(v,top=50,labels=substring(samplemat$batch,1,1),col=mycol,gene.selection="common")
>    dev.off()

>   design
   (Intercept) batch2nd batch3rd dz_catF dz_catL dz_catO
1             1        0        1       0       0       0
3             1        1        0       0       0       0
5             1        1        0       0       0       0
7             1        1        0       0       0       0
8             1        0        1       0       0       0
9             1        0        1       0       0       0
11           1        1        0       0       0       0
12           1        0        1       0       0       0
14           1        1        0       0       0       0
16           1        1        0       0       0       0
17           1        0        1       0       0       0
18           1        0        1       0       0       0
20           1        1        0       0       0       0
21           1        0        1       0       0       0
23           1        1        0       0       0       0
24           1        0        1       0       0       0
26           1        1        0       0       1       0
27           1        0        1       0       1       0
28           1        0        1       0       1       0
29           1        0        1       0       1       0
31           1        1        0       0       1       0
33           1        1        0       0       1       0
35           1        1        0       0       0       1
36           1        0        1       0       0       1
38           1        1        0       0       0       1
39           1        0        1       0       0       1
41           1        1        0       0       0       1
42           1        0        1       0       0       1
44           1        1        0       0       0       1
45           1        0        1       0       0       1
attr(,"assign")
[1] 0 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$batch
[1] "contr.treatment"

attr(,"contrasts")$dz_cat
[1] "contr.treatment"

>    sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] sva_3.8.0          mgcv_1.7-27        nlme_3.1-113       corpcor_1.6.6
[5] edgeR_3.4.2        matrixStats_0.8.14 limma_3.18.9       gplots_2.12.1

loaded via a namespace (and not attached):
[1] bitops_1.0-6       caTools_1.16       gdata_2.13.2       grid_3.0.2
[5] gtools_3.2.1       KernSmooth_2.23-10 lattice_0.20-24    Matrix_1.1-1.1
[9] R.methodsS3_1.6.1

Many thanks,
Heather

-------------- next part --------------
A non-text attachment was scrubbed...
Name: MDS_plot_voom_UCSDOlefsky_voom.png
Type: image/png
Size: 5908 bytes
Desc: MDS_plot_voom_UCSDOlefsky_voom.png
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20140121/ffd71c9f/attachment.png>


More information about the Bioconductor mailing list