[BioC] Select different linear models in voom
Gordon K Smyth
smyth at wehi.EDU.AU
Fri Mar 7 00:29:19 CET 2014
Dear Francesco,
If you have 400 covariates and 1000 samples it would appear that you can
feasibly use all covariates in a linear model at once. Does voom() work
on your computer with this full model or does R run out of memory?
It it works, then I would suggest running voom and limma on the full model
as usual, then removing covariates one by one from the linear model
(without re-running voom) if they result in no DE genes. In model
selection theory, this is called "backward selection".
Best wishes
Gordon
> Date: Thu, 6 Mar 2014 00:45:55 -0800 (PST)
> From: "Francesco [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, gatto at chalmers.se
> Subject: [BioC] Select different linear models in voom
>
>
> I have recently implemented the approach used in voom to estimate the
> mean and the variance of each log-cpm at the observational level. My
> dataset contains ~1000 samples, that features a discrete amount of
> metadata that may be used as covariates (~400). This allows, in
> principle, for a better construction of the linear model on which both
> the fitted mean and the fitted variance are estimated in voom, by simply
> including more factors.
>
> So far, I have used the AIC weights to test the probability for various
> linear models to be more likely to explain the data than the alternative
> models. Of course, testing all possible combinations of linear models is
> computationally infeasible (in principle, 2^400). However, even if I
> detected most gene are well explained by a simple LM, a non negligible
> fraction of them depend on additional factors.
>
> The point is the what makes the expression profile of a certain gene
> interesting, is when the covariates play an important role in
> determining its mean and variance. Therefore I am reluctant to use the
> simple LM because this would eliminate all the covariates. On the other
> hand, I am reluctant to use to more complicated LM because it clearly
> unnecessarily fits a large amount of genes.
>
> What is the best way to proceed?
>
> Thanks!
>
> -- output of sessionInfo():
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] sv_SE.UTF-8/sv_SE.UTF-8/sv_SE.UTF-8/C/sv_SE.UTF-8/sv_SE.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] edgeR_3.4.2 limma_3.18.9
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.2
>
> --
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list