[BioC] Right skewed histogram of p-values

Mon Sep 27 18:43:22 CEST 2010

Hi

you've already got a completely satisfying explanation. A right skew of 
the p value histogram is in fact a typical sign for a covariate for 
which you do not control.

A quick example to demonstrate. Let's simulate 1000 times a sample of 
four draws from normal distributions:

y <- cbind(
   rnorm( 1000, 20, 4 ),
   rnorm( 1000, 20, 4 ),
   rnorm( 1000, 20, 4 ),
   rnorm( 1000, 20, 4 ) )

The first two are supposed to be control, the third and fourth 
treatment, and they all have the same mean, i.e., the treatment has no 
effect.

Doing a t test on each realization gives us nicely uniform p values:

library(genefilter)
hist( rowttests( y, factor( c( "C", "C", "T", "T" ) ) )$p.value )

Now, assume that one of the two control and one of the two treatment 
samples has an elevated mean:

y <- cbind(
   rnorm( 1000, 20, 4 ),
   rnorm( 1000, 30, 4 ),
   rnorm( 1000, 20, 4 ),
   rnorm( 1000, 30, 4 ) )

In this case, you get right-skewed p values, because the t test is not 
informed of the extra effect present in one sample of each of the two 
groups:

hist( rowttests( y, factor( c( "C", "C", "T", "T" ) ) )$p.value )

   Simon