[R] Why are lagged correlations typically negative?
Bliese, Paul D LTC USAMH
paul.bliese at us.army.mil
Thu Aug 24 17:06:01 CEST 2006
Recently, I was working with some lagged designs where a vector of
observations at one time was used to predict a vector of observations at
another time using a lag 1 design. In the work, I noticed a lot of
negative correlations, so I ran a simple simulation with 2 matched
points. The crude simulation example below shows that the correlation
can be -1 or +1, but interestingly if you do this basic simulation
thousands of times, you get negative correlations 66 to 67% of the time.
If you simulate three matched observations instead of three you get
negative correlations about 74% of the time and then as you simulate 4
and more observations the number of negative correlations asymptotically
approaches an equal 50% for negative versus positive correlations
(though then with 100 observations one has 54% negative correlations).
Creating T1 and T2 so they are related (and not correlated 1 as in the
crude simulation) attenuates the effect. A more advanced simulation is
provided below for those interested.
Can anyone explain why this occurs in a way a non-mathematician is
likely to understand?
# Crude simulation
> (T1<-rnorm(3))
[1] -0.1594703 -1.3340677 0.2924988
> (T2<-c(T1[2:3],NA))
[1] -1.3340677 0.2924988 NA
> cor(T1,T2, use="complete")
[1] -1
> (T1<-rnorm(3))
[1] -0.84258593 -0.49161602 0.03805543
> (T2<-c(T1[2:3],NA))
[1] -0.49161602 0.03805543 NA
> cor(T1,T2, use="complete")
[1] 1
# More advanced simulation example
> lags
nran<-nobs+1 #need to generate 1 more random number than there are
for(i in 1:nreps){
return(OUT) #out is a 1 if the corr is negative or 0; 0 if positive
> LAGS.2<-lags(2,10000) #Number of observations matched = 2
> mean(LAGS.2)
0.6682 -0.3364
More information about the R-help
mailing list