[R] Rank-based p-value on large dataset
Deepayan Sarkar
deepayan at stat.wisc.edu
Thu Mar 3 23:32:38 CET 2005
On Thursday 03 March 2005 16:22, Sean Davis wrote:
> I have a fairly simple problem--I have about 80,000 values (call them
> y) that I am using as an empirical distribution and I want to find
> the p-value (never mind the multiple testing issues here, for the
> time being) of 130,000 points (call them x) from the empirical
> distribution. I typically do that (for one-sided test) something like
>
> loop over i in x
> p.val[i] = sum(y>x[i])/length(y)
>
> and repeat for all i. However, length(x) is large here as is
> length(y), so this process takes quite a long time. Any suggestions?
The obvious thing to do would be
p.val = 1 - ecdf(x)(y)
wouldn't it? On a 1.1 GHz Athlon, I get
> x <- rnorm(130000)
> y <- rnorm(80000)
> system.time(p.val <- 1 - ecdf(y)(x))
[1] 1.03 0.03 1.06 0.00 0.00
-Deepayan
More information about the R-help
mailing list