[BioC] Paired tests using limma and beadarray
James W. MacDonald
jmacdon at med.umich.edu
Mon Apr 18 16:26:07 CEST 2011
Hi Moritz,
On 4/17/2011 7:12 AM, Moritz Kebschull wrote:
> Dear list,
>
> I am looking at Illumina beadarrays of 18 patients sampled at baseline ("A")
> and after intervention ("B"), i.e. a total of 36 arrays. I think in this
> case, a paired assessment should be done. I am interested in diff. exp. at
> timepoint B over A.
>
> Thus far, I have read the data using the beadarray package, and tried to
> follow the limma manual's instructions for paired tests. I am, however,
> unsure about the contrasts, and my results look weird to me.
>
> I have done:
>
> # read and normalize data using beadarray
> library(beadarray)
> dataFile = "data.txt"
> qcFile = "control.txt"
> sampleSheet = "SampleSheet.csv"
> BSData = readBeadSummaryData(dataFile = dataFile, dec=",", qcFile = NULL,
> skip = 0, qc.skip = 0)
> BSData.quantile = normaliseIllumina(BSData, method = "quantile", transform =
> "log2")
> BSData.genes = BSData.quantile[which(fData(BSData)$Status == "Gene"), ]
> expressed = apply(Detection(BSData.genes)< 0.05, 1, any)
> BSData.filt = BSData.genes[expressed,]
>
> # set up paired tests using limma - what's wrong here?
> library(limma)
> patients = c(17, 2, 18, 5, 3, 8, 6, 14, 4, 13, 7, 15, 9, 10, 11, 12, 17, 18,
> 2, 5, 3, 8, 6, 14, 1, 12, 4, 13, 7, 15, 9, 16, 10, 1, 11, 16)
> patients = factor(patients)
> treatment = c(rep("B",7), "A", "B", "A", "B", "A", "B", rep("A",10), "B",
> "A", "B", "A", "B", "A", "B", "A", "A", rep("B", 4) )
> treatment = factor(treatment, levels = c("A", "B"))
> design = model.matrix(~patients+treatment)
> fit = lmFit(exprs(BSData.filt), design)
> ebFit=eBayes(fit)
>
> # left out some annotation steps
>
> # not sure what contrast is for what - does #19 stand for B-A??
Not the way you have things set up. You are setting patient 1 to be the
baseline with your call to model.matrix(), so the treatmentB parameter
is equal to B-A-patient1A.
It's pretty easy to figure this out. Look at the first row of your
design matrix. This by definition is computing the average expression of
patient 17 who has been given treatment B. You have coefficients for
patient1, patient17, and treatmentB.
Now we know that the patient1 coefficient is the mean expression for
patient1A (patient 1 treated with A), and the patient17 coefficient is
the mean expression for patient17A, and the row itself is supposed to
compute the mean expression for patient17B. Note that we know this
because all patients were treated with either A or B. Since the last
column is treatmentB, all other columns have to incorporate treatmentA.
So we have
patient17B = patient1A + patient17A + treatmentB
and if we solve for treatmentB, we get
treatmentB = patient17B - patient17A - patient1
Which clearly isn't what you want.
You want
design <- model.matrix(~ 0+patient+treatment)
And I leave it up to you to convince yourself that this is correct. ;-D
Best,
Jim
> topTable(ebFit, coef=19)
>
>
> The primary issue is what contrast to choose - for normal, unpaired tests I
> would have set up a contrast matrix for "B-A". When I look at the different
> contrasts, I am missing patient #1.
>
>> ebFit$contrasts
> (Intercept) patients2 patients3 patients4 patients5 patients6
> (Intercept) 1 0 0 0 0 0
> patients2 0 1 0 0 0 0
> patients3 0 0 1 0 0 0
> patients4 0 0 0 1 0 0
> patients5 0 0 0 0 1 0
> patients6 0 0 0 0 0 1
> patients7 0 0 0 0 0 0
> patients8 0 0 0 0 0 0
> patients9 0 0 0 0 0 0
> patients10 0 0 0 0 0 0
> patients11 0 0 0 0 0 0
> patients12 0 0 0 0 0 0
> patients13 0 0 0 0 0 0
> patients14 0 0 0 0 0 0
> patients15 0 0 0 0 0 0
> patients16 0 0 0 0 0 0
> patients17 0 0 0 0 0 0
> patients18 0 0 0 0 0 0
> treatmentB 0 0 0 0 0 0
> patients7 patients8 patients9 patients10 patients11 patients12
> (Intercept) 0 0 0 0 0 0
> patients2 0 0 0 0 0 0
> patients3 0 0 0 0 0 0
> patients4 0 0 0 0 0 0
> patients5 0 0 0 0 0 0
> patients6 0 0 0 0 0 0
> patients7 1 0 0 0 0 0
> patients8 0 1 0 0 0 0
> patients9 0 0 1 0 0 0
> patients10 0 0 0 1 0 0
> patients11 0 0 0 0 1 0
> patients12 0 0 0 0 0 1
> patients13 0 0 0 0 0 0
> patients14 0 0 0 0 0 0
> patients15 0 0 0 0 0 0
> patients16 0 0 0 0 0 0
> patients17 0 0 0 0 0 0
> patients18 0 0 0 0 0 0
> treatmentB 0 0 0 0 0 0
> patients13 patients14 patients15 patients16 patients17
> patients18
> (Intercept) 0 0 0 0 0
> 0
> patients2 0 0 0 0 0
> 0
> patients3 0 0 0 0 0
> 0
> patients4 0 0 0 0 0
> 0
> patients5 0 0 0 0 0
> 0
> patients6 0 0 0 0 0
> 0
> patients7 0 0 0 0 0
> 0
> patients8 0 0 0 0 0
> 0
> patients9 0 0 0 0 0
> 0
> patients10 0 0 0 0 0
> 0
> patients11 0 0 0 0 0
> 0
> patients12 0 0 0 0 0
> 0
> patients13 1 0 0 0 0
> 0
> patients14 0 1 0 0 0
> 0
> patients15 0 0 1 0 0
> 0
> patients16 0 0 0 1 0
> 0
> patients17 0 0 0 0 1
> 0
> patients18 0 0 0 0 0
> 1
> treatmentB 0 0 0 0 0
> 0
> treatmentB
> (Intercept) 0
> patients2 0
> patients3 0
> patients4 0
> patients5 0
> patients6 0
> patients7 0
> patients8 0
> patients9 0
> patients10 0
> patients11 0
> patients12 0
> patients13 0
> patients14 0
> patients15 0
> patients16 0
> patients17 0
> patients18 0
> treatmentB 1
>
> Is contrast #19 the one I am looking for? Where is patient #1? The thing is
> that when looking at pt #2 - #18 seperately, I do find a couple of genes
> regulated pretty consistently, and these ones do not show up at all in the
> overall comparison (contrast 19). In fact, there's pretty much no difference
> using that contrast, which I find weird...
>
> Perhaps anyone has an idea what's wrong? Many thanks!
>
> Moritz (Bonn, Germany)
>
>
> sessionInfo()
> R version 2.12.1 (2010-12-16)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Germany.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] illuminaHumanv4.db_1.8.0 org.Hs.eg.db_2.4.6 RSQLite_0.9-4
>
> [4] DBI_0.2-5 AnnotationDbi_1.12.0 limma_3.6.9
>
> [7] beadarray_2.0.6 Biobase_2.10.0
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list