[BioC] limma design and contrast matrix for paired experiment
Gordon K Smyth
smyth at wehi.EDU.AU
Sun Jan 13 01:00:31 CET 2013
Dear David,
No, your design matrix is incorrect because it ignores the pairing by
sample in your experimental design.
You can treat Sample as either a fixed or a random factor. The fixed
approach is closely analogous to a classical paired t-test. For an
experiment like yours, the fixed approach is explained in Section 3.5 of
the edgeR User's Guide:
http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
You can follow the same construction of the design matrix even though you
are using limma.
The random approach is a bit more aggressive. For an experiment like
yours, the random approach is explained in Section 8.7 of the limma User's
Guide:
http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf
I would probably recommend the first approach for your data. However the
second approach is necessary if you want to test for differences between
the two cell lines.
Best wishes
Gordon
> Date: Fri, 11 Jan 2013 18:31:23 +0100
> From: David Westergaard <david at harsk.dk>
> To: bioconductor at r-project.org
> Subject: [BioC] limma design and contrast matrix for paired
> experiment
>
> Hello,
>
> I am analysing microarray data performed on two cell cultures, in
> which the gene expression were measured before (C) and after treatment
> (T), so that the targets look like this:
>
> Cell-line Treatment Sample
> Cell 1 C 1
> Cell 1 T 1
> Cell 2 C 2
> Cell 2 T 2
> Cell 1 C 3
> Cell 1 T 3
> Cell 2 C 4
> Cell 2 T 4
> Cell 1 C 5
> Cell 1 T 5
> Cell 2 C 6
> Cell 2 T 6
>
> All experiments were performed on single-channel Agilent arrays, with
> 4 samples pr. slide. I am interested in determining the differentially
> expressed genes between Cell1 before and after treatment, as well as
> Cell2 before and after treatment. This is the preliminary code:
>
> # Load and normalize data
> RG <- read.maimages(targets$FileName,source="agilent.median",green.only=TRUE)
> # Assume there is a col called FileName in the targets section
> RG <- backgroundCorrect(RG, method="normexp", offset=16)
> RGNorm <- normalizeBetweenArrays(RG, method="quantile")
> RGNorm.ave <- avereps(RGNorm, ID=RGNorm$genes$ProbeName)
>
> # Create design
> Pairing <- paste(rep(c('C1-','C1-','C2-','C2-'),3),c(1,1,2,2,1,1,2,2,1,1,2,2),rep(c('C','T'),6),sep='')
> pair <- factor(Pairing,levels=unique(Pairing))
> design <- model.matrix( ~ 0 + pair )
> colnames(design) <- c('C1.C','C1.T','C2.C','C2.T')
>
> # Fit data
> fit <- lmFit(RGNorm.ave, design=design)
>
> cont.matrix <- makeContrasts(C1 = C1.T-C1.C, C2=C2.T-C2.C, levels=design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2)
>
>
> For the experiment, is the design/contrast matrix a proper choice to
> answer the questions of 'Which genes are differentially expressed in
> Cell 1' and 'Which genes are differentially expressed in Cell 2'?
> Further, should I do any technical correction, such as
> duplicateCorrelation or similar? The reason I am asking is that even
> at p<=0.01 I am getting a very high number of differentially expressed
> probes (4500ish for Cell 1, and 7500ish for Cell 2, respectively), and
> I want to make sure this is biological significance, and not some
> technical aspect I have missed.
>
> Thanks in advance.
>
> Best,
> David
>
>
>> sessionInfo()
> R version 2.14.1 (2011-12-22)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] statmod_1.4.16 limma_3.10.3
>
> loaded via a namespace (and not attached):
> [1] tcltk_2.14.1 tools_2.14.1
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list