[BioC] Random effects and variance components
James W. MacDonald
jmacdon at med.umich.edu
Mon Jun 1 15:49:22 CEST 2009
Hi Paolo,
I don't think you can fit the model you describe using limma, and I
really don't think you can get the variance components. If you want to
fit more sophisticated mixed models, you will likely need to use the
lme4 package, and the lmer() function in particular.
But note that this route will require much more work on your part, both
in understanding how lme4 and lmer() work, as well as writing the code
to fit individual models to each reporter and extracting the results.
Best,
Jim
Paolo Innocenti wrote:
> Dear Gordon and list,
>
> thanks for the previous help, it was indeed helpful. Nonetheless, even
> after some strolling (and independent trials-and-errors), I am still
> stuck on this issue:
>
> we came to the conclusion that the simplest good model for our affy
> experiment is the following:
>
> design <- model.matrix(~ sex*line, data=pData(data))
>
> sex: M/F,
> line: 15 levels (different clones)
> 8 biological replicates for each line (4 for each sex)
>
> The issue here is that the "line" factor should be treated as random.
> First, because each line is a haplotype randomly picked from a large
> outbred population. Second, because we would like to estimate, from the
> variance components, the heritability of the transcript (the variance
> explained by the "line" term can be approximated to the genetic variance).
>
> Gordon Smyth wrote:
> "Finally, you can add the biolrep as a random effect using the
> duplicateCorrelation() function with block argument, as explained in the
> limma User's Guide, but I am not convinced yet that this is absolutely
> necessary for your experiment."
>
> I am not sure I understand what you mean here, but the random effect I
> want to include is "line", not the biological replicate (that is
> numbered 1:4 in each line/sex, but it says nothing about relationships
> between each 1, each 2, etc...)
>
> So, the two questions are:
>
> 1) How to treat "line" as random. I appreciate that this issue is
> explained in chapter 8.2 of the limma user's guide, but I still don't
> get how to fit the interaction and how to include my random effect in my
> contrast, or in my toptable. For instance:
>
> design <- model.matrix(~ sex*line, data=pData(data))
> randomline <- duplicateCorrelation(eset, design,
> block=rep(1:15,each=8))
> fit <- lmFit(eset, design, block=rep(1:15, each=8),
> cor=randomline$consensus)
>
> I get this error:
>
> Error in chol.default(V) :
> the leading minor of order 2 is not positive definite
>
> Probably because the block is exactly the same vector as the line,
> already included in the design.
> On the other hand, if I fit:
>
> design <- model.matrix(~ sex, data=pData(data))
> randomline <- duplicateCorrelation(eset, design,
> block=rep(1:15,each=8))
> fit <- lmFit(eset, design, block=rep(1:15, each=8),
> cor=randomline$consensus)
>
> There is no line:sex interaction, and the only effect I can obtain is
> the effect of sex.
> How can I get the effect of the line, and of the sex:line interaction?
>
> 2) Where can I find the variance components of my random effect?
>
> Thanks in advance for any help,
> paolo
>
>>> sessionInfo()
>> R version 2.9.0 (2009-04-17) x86_64-unknown-linux-gnu
>> locale:
>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>>
>>
>> attached base packages:
>> [1] tcltk stats graphics grDevices utils datasets
>> methods [8] base
>> other attached packages:
>> [1] statmod_1.4.0 qvalue_1.18.0 limma_2.18.1 affy_1.22.0
>> Biobase_2.4.1 [6] maanova_1.14.0
>>
>> loaded via a namespace (and not attached):
>> [1] affyio_1.12.0 preprocessCore_1.6.0 tools_2.9.0
>
> Gordon K Smyth wrote:
>> Dear Paolo,
>>
>> As Naomi Altman as already told you, analysing an experiment such as
>> this is straightforward with limma. I guess the problem you are
>> having is that you are trying to use the limma User's Guide's
>> suggestion of forming a composite factor out of the individual factors
>> (called the group means parametrization), and you don't know how to
>> define contrasts for interactions from this factor. This does become
>> a little more involved for experiments with more factors. Can I
>> suggest that you instead make use of the factorial formulae in R when
>> you make up the design matrix, then you can probably dispense with the
>> contrast step altogether.
>>
>> You could for example use
>>
>> targets <- read.delim("targets.txt")
>> design <- model.matrix(~Batch+Sex*(Phen/Line), data=targets)
>>
>> This will produce a design matrix with the following columns.
>>
>> > colnames(design)
>> [1] "(Intercept)" "Batch" "SexM"
>> [4] "PhenH" "PhenL" "PhenA:Line"
>> [7] "PhenH:Line" "PhenL:Line" "SexM:PhenH"
>> [10] "SexM:PhenL" "SexM:PhenA:Line" "SexM:PhenH:Line"
>> [13] "SexM:PhenL:Line"
>>
>> To find genes significant for the sex x line interaction, you can
>> simply use
>>
>> fit <- lmFit(eset, design)
>> fit <- eBayes(fit)
>> topTable(fit, coef=9:13)
>>
>> On the other hand,
>>
>> topTable(fit, coef=9:10)
>>
>> is the sex x phen interaction.
>>
>> Finally, you can add the biolrep as a random effect using the
>> duplicateCorrelation() function with block argument, as explained in
>> the limma User's Guide, but I am not convinced yet that this is
>> absolutely necessary for your experiment.
>>
>> Can I also suggest that you stroll over to the mathematics department
>> at Uppsala and talk to someone interested in bioinformatics and
>> microarray analysis, say Professor Tom Britton, and see if you can get
>> ongoing help with statistics and design issues.
>>
>> Best wishes
>> Gordon
>>
>>
>>> Date: Mon, 04 May 2009 14:09:14 +0200
>>> From: Paolo Innocenti <paolo.innocenti at ebc.uu.se>
>>> Subject: Re: [BioC] Yet another nested design in limma
>>> Cc: AAA - Bioconductor <bioconductor at stat.math.ethz.ch>
>>>
>>> Hi all,
>>>
>>> since I received a few emails in my mailbox by people interested in a
>>> solution for this design (or a design similar to this one), but there is
>>> apparently no (easy) solution in limma, I was wondering if anyone could
>>> suggest a package for differential expression analysis that allows
>>> dealing with:
>>>
>>> nested designs,
>>> random effects,
>>> multiple factorial designs with more than 2 levels.
>>>
>>> I identified siggenes, maanova, factDesign that could fit my needs, but
>>> I would like to have a comment by someone with more experience before
>>> diving into a new package.
>>>
>>> Best,
>>> paolo
>>>
>>>
>>>
>>> Paolo Innocenti wrote:
>>>> Hi Naomi and list,
>>>>
>>>> some time ago I asked a question on how to model an experiment in
>>>> limma.
>>>> I think I need some additional help with it as the experiment grew in
>>>> complexity. I also added a factor "batch" because the arrays were
>>>> run in
>>>> separate batches, and I think would be good to control for it.
>>>> The dataframe with phenotypic informations ("dummy") looks like this:
>>>>
>>>> >> Phen Line Sex Batch BiolRep
>>>> >> File1 H 1 M 1 1
>>>> >> File2 H 1 M 1 2
>>>> >> File3 H 1 M 2 3
>>>> >> File4 H 1 M 2 4
>>>> >> File5 H 1 F 1 1
>>>> >> File6 H 1 F 1 2
>>>> >> File7 H 1 F 2 3
>>>> >> File8 H 1 F 2 4
>>>> >> File9 H 2 M 1 1
>>>> >> File10 H 2 M 1 2
>>>> >> File11 H 2 M 2 3
>>>> >> File12 H 2 M 2 4
>>>> >> File13 H 2 F 1 1
>>>> >> File14 H 2 F 1 2
>>>> >> File15 H 2 F 2 3
>>>> >> File16 H 2 F 2 4
>>>> >> File17 L 3 M 1 1
>>>> >> File18 L 3 M 1 2
>>>> >> File19 L 3 M 2 3
>>>> >> File20 L 3 M 2 4
>>>> >> File21 L 3 F 1 1
>>>> >> File22 L 3 F 1 2
>>>> >> File23 L 3 F 2 3
>>>> >> File24 L 3 F 2 4
>>>> >> File25 L 4 M 1 1
>>>> >> File26 L 4 M 1 2
>>>> >> File27 L 4 M 2 3
>>>> >> File28 L 4 M 2 4
>>>> >> File29 L 4 F 1 1
>>>> >> File30 L 4 F 1 2
>>>> >> File31 L 4 F 2 3
>>>> >> File32 L 4 F 2 4
>>>> >> File33 A 5 M 1 1
>>>> >> File34 A 5 M 1 2
>>>> >> File35 A 5 M 2 3
>>>> >> File36 A 5 M 2 4
>>>> >> File37 A 5 F 1 1
>>>> >> File38 A 5 F 1 2
>>>> >> File39 A 5 F 2 3
>>>> >> File40 A 5 F 2 4
>>>> >> File41 A 6 M 1 1
>>>> >> File42 A 6 M 1 2
>>>> >> File43 A 6 M 2 3
>>>> >> File44 A 6 M 2 4
>>>> >> File45 A 6 F 1 1
>>>> >> File46 A 6 F 1 2
>>>> >> File47 A 6 F 2 3
>>>> >> File48 A 6 F 2 4
>>>>
>>>> In total I have
>>>> Factor "Phen", with 3 levels
>>>> Factor "Line", nested in Phen, 6 levels
>>>> Factor "Sex", 2 levels
>>>> Factor "Batch", 2 levels
>>>>
>>>> I am interested in:
>>>>
>>>> 1) Effect of sex (M vs F)
>>>> 2) Interaction between "Sex" and "Line" (or "Sex" and "Phen")
>>>>
>>>> Now, I can't really come up with a design matrix (not to mention the
>>>> contrast matrix).
>>>>
>>>> Naomi Altman wrote:
>>>>> You can design this in limma quite readily. Nesting really just means
>>>>> that only a subset of the possible contrasts are of interest. Just
>>>>> create the appropriate contrast matrix and you are all set.
>>>>
>>>> I am not really sure with what you mean here. Should I treat all the
>>>> factors as in a factorial design?
>>>> I might do something like this:
>>>>
>>>> phen <- factor(dummy$Phen)
>>>> line <- factor(dummy$Line)
>>>> sex <- factor(dummy$Sex)
>>>> batch <- factor(dummy$Batch)
>>>> fact <- factor(paste(sex,phen,line,sep="."))
>>>> design <- model.matrix(~ 0 + fact + batch)
>>>> colnames(design) <- c(levels(fact), "batch2")
>>>> fit <- lmFit(dummy.eset,design)
>>>> contrast <- makeContrasts(
>>>> sex= (F.H.1 + F.H.2 + F.L.3 + F.L.4 + F.A.5 + F.A.6) - (M.H.1 +
>>>> M.H.2 + M.L.3 + M.L.4 + M.A.5 + M.A.6),
>>>> levels=design)
>>>> fit2 <- contrasts.fit(fit,contrast)
>>>> fit2 <- eBayes(fit2)
>>>>
>>>> In this way I can correctly (I presume) obtain the effect of sex, but
>>>> how can I get the interaction term between sex and line?
>>>> I presume there is a "easy" way, but I can't see it...
>>>>
>>>> Thanks,
>>>> paolo
>>>>
>>>>
>>>>>
>>>>> --Naomi
>>>>>
>>>>> At 12:08 PM 2/16/2009, Paolo Innocenti wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I have an experimental design for a Affy experiment that looks like
>>>>>> this:
>>>>>>
>>>>>> Phen Line Sex Biol.Rep.
>>>>>> File1 H 1 M 1
>>>>>> File2 H 1 M 2
>>>>>> File3 H 1 F 1
>>>>>> File4 H 1 F 2
>>>>>> File5 H 2 M 1
>>>>>> File6 H 2 M 2
>>>>>> File7 H 2 F 1
>>>>>> File8 H 2 F 2
>>>>>> File9 L 3 M 1
>>>>>> File10 L 3 M 2
>>>>>> File11 L 3 F 1
>>>>>> File12 L 3 F 2
>>>>>> File13 L 4 M 1
>>>>>> File14 L 4 M 2
>>>>>> File15 L 4 F 1
>>>>>> File16 L 4 F 2
>>>>>>
>>>>>>
>>>>>> This appears to be a slightly more complicated situation than the one
>>>>>> proposed in the section 8.7 of the limma users guide (p.45) or by
>>>>>> Jenny on this post:
>>>>>>
>>>>>> https://stat.ethz.ch/pipermail/bioconductor/2006-February/011965.html
>>>>>>
>>>>>> In particular, I am intersted in
>>>>>> - Effect of "sex" (M vs F)
>>>>>> - Interaction between "sex" and "phenotype ("line" nested)
>>>>>> - Effect of "phenotype" in males
>>>>>> - Effect of "phenotype" in females
>>>>>>
>>>>>> Line should be nested in phenotype, because they are random "strains"
>>>>>> that happened to end up in phenotype H or L.
>>>>>>
>>>>>> Can I design this in limma? Is there a source of information about
>>>>>> how to handle with this? In particular, can I design a single model
>>>>>> matrix and then choose the contrasts I am interested in?
>>>>>>
>>>>>> Any help is much appreciated,
>>>>>> paolo
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Paolo Innocenti
>>>>>> Department of Animal Ecology, EBC
>>>>>> Uppsala University
>>>>>> Norbyv?gen 18D
>>>>>> 75236 Uppsala, Sweden
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>> Naomi S. Altman 814-865-3791 (voice)
>>>>> Associate Professor
>>>>> Dept. of Statistics 814-863-7114 (fax)
>>>>> Penn State University 814-865-1348
>>>>> (Statistics)
>>>>> University Park, PA 16802-2111
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Paolo Innocenti
>>> Department of Animal Ecology, EBC
>>> Uppsala University
>>> Norbyv?gen 18D
>>> 75236 Uppsala, Sweden
>>
>
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
More information about the Bioconductor
mailing list