[R] Random numbers negatively correlated?
Thomas Lumley
tlumley at u.washington.edu
Tue Jun 27 20:07:52 CEST 2006
On Tue, 27 Jun 2006, Christian Hennig wrote:
> Dear list,
>
> I did simulations in which I generated 10000
> independent Bernoulli(0.5)-sequences of length 100. I estimated
> p for each sequence and I also estimated the conditional probability that
> a one is followed by another one (which should be p as well).
> However, the second probability is significantly smaller than 0.5 (namely
> about 0.494, see below) and of course smaller than the direct estimates of
> p as well, indicating negative correlation between the random numbers.
>
> See below the code and the results.
> Did I do something wrong or are the numbers in
> fact negatively correlated? (A type I error is quite unlikely with a
> p-value below 2.2e-16.)
I think you did something wrong, and that there is a problem with
overlapping blocks of two.
If you do
x<-matrix(rbinom(1e6,1,p),ncol=2)
tt<-table(x[,1],x[,2])
you get much better looking results. In this case you have 500,000
independent pairs of numbers that can be 01, 10, 11, 00. A test for
independence seems fine.
> tt
0 1
0 125246 124814
1 125140 124800
> fisher.test(tt)
Fisher's Exact Test for Count Data
data: tt
p-value = 0.8987
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.9896745 1.0119211
sample estimates:
odds ratio
1.000735
In your case the deficit in est11 is suspiciously close to 0.5/n. Changing
n to 1000 and using the same seed I get
> mean(est11)-0.5
[1] -0.0005534743
10 times smaller, and still close to 0.5/n.
Now, consider what happens in a case where we can see all the
possibilities, n=3
x 1/0 1/1
000 - -
001 - -
010 1 0
011 0 1
100 1 0
101 1 0
110 1 1
111 2 0
So that if each of these three triplets has the same probability your
est11 would be 2/8 rather than 4/8, and est11 is not an unbiased estimate
of the long-run conditional probability. The bias is of order 1/n, so you
need n to be of larger order than sqrt(simruns).
-thomas
More information about the R-help
mailing list