[BioC] limma design and contrast matrix for paired experiment

Sun Jan 13 01:00:31 CET 2013

Dear David,

No, your design matrix is incorrect because it ignores the pairing by 
sample in your experimental design.

You can treat Sample as either a fixed or a random factor.  The fixed 
approach is closely analogous to a classical paired t-test.  For an 
experiment like yours, the fixed approach is explained in Section 3.5 of 
the edgeR User's Guide:

http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

You can follow the same construction of the design matrix even though you 
are using limma.

The random approach is a bit more aggressive.  For an experiment like 
yours, the random approach is explained in Section 8.7 of the limma User's 
Guide:

http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf

I would probably recommend the first approach for your data.  However the 
second approach is necessary if you want to test for differences between 
the two cell lines.

Best wishes
Gordon

> Date: Fri, 11 Jan 2013 18:31:23 +0100
> From: David Westergaard <david at harsk.dk>
> To: bioconductor at r-project.org
> Subject: [BioC]  limma design and contrast matrix for paired
> 	experiment
>
> Hello,
>
> I am analysing microarray data performed on two cell cultures, in
> which the gene expression were measured before (C) and after treatment
> (T), so that the targets look like this:
>
> Cell-line	Treatment	Sample
> Cell 1	C	1
> Cell 1	T	1
> Cell 2	C	2
> Cell 2	T	2
> Cell 1	C	3
> Cell 1	T	3
> Cell 2	C	4
> Cell 2	T	4
> Cell 1	C	5
> Cell 1	T	5
> Cell 2	C	6
> Cell 2	T	6
>
> All experiments were performed on single-channel Agilent arrays, with
> 4 samples pr. slide. I am interested in determining the differentially
> expressed genes between Cell1 before and after treatment, as well as
> Cell2 before and after treatment. This is the preliminary code:
>
> # Load and normalize data
> RG <- read.maimages(targets$FileName,source="agilent.median",green.only=TRUE)
> # Assume there is a col called FileName in the targets section
> RG <- backgroundCorrect(RG, method="normexp", offset=16)
> RGNorm <- normalizeBetweenArrays(RG, method="quantile")
> RGNorm.ave <- avereps(RGNorm, ID=RGNorm$genes$ProbeName)
>
> # Create design
> Pairing <- paste(rep(c('C1-','C1-','C2-','C2-'),3),c(1,1,2,2,1,1,2,2,1,1,2,2),rep(c('C','T'),6),sep='')
> pair <- factor(Pairing,levels=unique(Pairing))
> design <- model.matrix( ~ 0 + pair )
> colnames(design) <- c('C1.C','C1.T','C2.C','C2.T')
>
> # Fit data
> fit <- lmFit(RGNorm.ave, design=design)
>
> cont.matrix <- makeContrasts(C1 = C1.T-C1.C, C2=C2.T-C2.C, levels=design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2)
>
>
> For the experiment, is the design/contrast matrix a proper choice to
> answer the questions of 'Which genes are differentially expressed in
> Cell 1' and  'Which genes are differentially expressed in Cell 2'?
> Further, should I do any technical correction, such as
> duplicateCorrelation or similar? The reason I am asking is that even
> at p<=0.01 I am getting a very high number of differentially expressed
> probes (4500ish for Cell 1, and 7500ish for Cell 2, respectively), and
> I want to make sure this is biological significance, and not some
> technical aspect I have missed.
>
> Thanks in advance.
>
> Best,
> David
>
>
>> sessionInfo()
> R version 2.14.1 (2011-12-22)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] statmod_1.4.16 limma_3.10.3
>
> loaded via a namespace (and not attached):
> [1] tcltk_2.14.1 tools_2.14.1
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}