[R] Why are lagged correlations typically negative?

Thu Aug 24 18:02:02 CEST 2006

The covariance has the same sign as the
correlation so lets calculate the sample covariance
of the vector T1 = (X,Y) with T2 = (Y,Z) where we ignored
the third component in each case due to use="complete".

	cov(T1, T2) = XY + YZ - (X+Y)/2 * (Y+Z)/2

X, Y and Z are random variables so we take the
expectation to get the overall average over many
runs.  Expectation is linear and all the random
variables are uncorrelated so:

	EXY + EYZ - E[(X+Y)/2 * (Y+Z)/2]
	= EXY + EYZ - EXY/4 - EXZ/4 - EYY/4 - EYZ/4
	= -EYY/4
	< 0

where the third line is due to the fact that
all terms in the second line except the surviving
term are zero.

On 8/24/06, Bliese, Paul D LTC USAMH <paul.bliese at us.army.mil> wrote:
> Recently, I was working with some lagged designs where a vector of
> observations at one time was used to predict a vector of observations at
> another time using a lag 1 design.  In the work, I noticed a lot of
> negative correlations, so I ran a simple simulation with 2 matched
> points.  The crude simulation example below shows that the correlation
> can be -1 or +1, but interestingly if you do this basic simulation
> thousands of times, you get negative correlations 66 to 67% of the time.
> If you simulate three matched observations instead of three you get
> negative correlations about 74% of the time and then as you simulate 4
> and more observations the number of negative correlations asymptotically
> approaches an equal 50% for negative versus positive correlations
> (though then with 100 observations one has 54% negative correlations).
> Creating T1 and T2 so they are related (and not correlated 1 as in the
> crude simulation) attenuates the effect.  A more advanced simulation is
> provided below for those interested.
>
> Can anyone explain why this occurs in a way a non-mathematician is
> likely to understand?
>
> Thanks,
>
> Paul
>
> #############
> # Crude simulation
> #############
> > (T1<-rnorm(3))
> [1] -0.1594703 -1.3340677  0.2924988
> > (T2<-c(T1[2:3],NA))
> [1] -1.3340677  0.2924988         NA
> > cor(T1,T2, use="complete")
> [1] -1
>
> > (T1<-rnorm(3))
> [1] -0.84258593 -0.49161602  0.03805543
> > (T2<-c(T1[2:3],NA))
> [1] -0.49161602  0.03805543          NA
> > cor(T1,T2, use="complete")
> [1] 1
>
> ###########
> # More advanced simulation example
> ###########
> > lags
> function(nobs,nreps,rho=1){
> OUT<-data.frame(NEG=rep(NA,nreps),COR=rep(NA,nreps))
> nran<-nobs+1  #need to generate 1 more random number than there are
> observations
>  for(i in 1:nreps){
>      V1<-rnorm(nran)
>      V2<-sqrt(1-rho^2)*rnorm(nran)+rho*V1
>      #print(cor(V1,V2))
>      V1<-V1[1:nran-1]
>      V2<-V2[2:nran]
>      OUT[i,1]<-ifelse(cor(V1,V2)<=0,1,0)
>      OUT[i,2]<-cor(V1,V2)
>  }
> return(OUT) #out is a 1 if the corr is negative or 0; 0 if positive
> }
> > LAGS.2<-lags(2,10000)  #Number of observations matched = 2
> > mean(LAGS.2)
>    NEG     COR
>  0.6682 -0.3364
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>