[BioC] Limma : Single Channel experiment design matrix

Fri Mar 7 16:40:35 CET 2014

Hi Koran,

On 3/7/2014 3:49 AM, Koran [guest] wrote:
> Dear All,
>
> I have a question regarding the way to analyse single channel experiment (several groups).
>
> In a first approach, I followed the limma user's guide for several groups (chapter 9.3), and used a contrast
> matrix to make the comparison between two groups among all groups.
>
> I also followed another approach : I take a sub expression set with only the two groups of samples I need to compare, and then follow the two groups approach (chapter 9.2)
>
> If fold change remains the same, the p.value of moderated t-test is different :
>
> for the "chapter 9.3" I get this (topTable):
>                logFC   AveExpr         t      P.Value    adj.P.Val        B
> NM_013409  4.804450  9.351186  63.46856 5.198462e-32 2.225306e-27 60.42083
> NM_170685  3.327586  7.476924  43.29198 2.292074e-27 4.102931e-23 51.64301
> NM_021995  3.598441  8.731876  42.94068 2.875416e-27 4.102931e-23 51.44328
> NM_000014  2.686684 11.968353  38.61755 5.481149e-26 4.817512e-22 48.80565
> NM_001747  2.727227  8.834094  38.33543 6.716748e-26 4.817512e-22 48.62109
>
> for the "chapter 9.2", I get this topTable :
>                logFC   AveExpr         t      P.Value    adj.P.Val        B
> NM_013409  4.804450 10.238329  70.14768 7.077519e-15 2.709195e-10 23.07593
> NM_015464  3.868533  9.850459  66.20398 1.265772e-14 2.709195e-10 22.72371
> NM_000119 -3.322662 11.608264 -61.31983 2.733108e-14 3.899871e-10 22.22951
> BC025320   2.908061  7.112412  56.61705 6.089619e-14 6.516958e-10 21.68233
> NM_000014  2.686684 11.682645  53.85715 1.005598e-13 8.609327e-10 21.32326
> NM_170685  3.327586  7.826983  51.22412 1.662803e-13 1.086579e-09 20.95091
>
>
> Of course, logFC remains the same, Avg Expression are obviously differents, but the p.value are differents.
> So I was wondering why ? and wich is the best approach to choose since one give results with more statistical power ?

The difference between the two models has to do primarily with the 
measure of intra-group variability, which is used to construct the 
denominator of your t-statistic. This measure is a pooled estimate, 
based on all samples in the model. All else equal, increasing the number 
of samples used to estimate variance tends to make the estimate smaller 
(and arguably more accurate). Since you are thus shrinking your 
denominator, the statistic gets larger and you get smaller p-values.

As a general rule I would think fitting the first model would be the 
preferred way to go.

Best,

Jim

>
> Thank you for your kind answers.
>
> Koran
>
>
>
>
>
>
>
>
>
>
>
>
>   -- output of sessionInfo():
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
>
> other attached packages:
>   [1] RColorBrewer_1.0-5         R.basic_0.53.0             R.utils_1.29.8             R.oo_1.18.0                R.methodsS3_1.6.1
>   [6] plotrix_3.5-3              multicore_0.1-7            pvclust_1.2-2              arrayQualityMetrics_3.18.0 impute_1.36.0
> [11] marray_1.40.0              limma_3.18.13              fortunes_1.5-2             snowfall_1.84-6            snow_0.3-13
>
> loaded via a namespace (and not attached):
>   [1] affy_1.40.0           affyio_1.30.0         affyPLM_1.38.0        annotate_1.40.1       AnnotationDbi_1.24.0  beadarray_2.12.0
>   [7] BeadDataPackR_1.14.0  Biobase_2.22.0        BiocGenerics_0.8.0    BiocInstaller_1.12.0  Biostrings_2.30.1     Cairo_1.5-5
> [13] cluster_1.14.4        colorspace_1.2-4      DBI_0.2-7             Formula_1.1-1         gcrma_2.34.0          genefilter_1.44.0
> [19] grid_3.0.2            Hmisc_3.14-2          hwriter_1.3           IRanges_1.20.6        KernSmooth_2.23-10    lattice_0.20-27
> [25] latticeExtra_0.6-26   parallel_3.0.2        plyr_1.8.1            preprocessCore_1.24.0 Rcpp_0.11.0           reshape2_1.2.2
> [31] RSQLite_0.11.4        setRNG_2011.11-2      splines_3.0.2         stats4_3.0.2          stringr_0.6.2         survival_2.37-7
> [37] SVGAnnotation_0.93-1  tools_3.0.2           vsn_3.30.0            XML_3.95-0.2          xtable_1.7-1          XVector_0.2.0
> [43] zlibbioc_1.8.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099