[R] Problem when creating matrix of values based on covariance matrix
Bert Gunter
gunter.berton at gene.com
Sat Aug 11 16:27:28 CEST 2012
Sampling error? Do you realize how large a sample size you would
need to precisely estimate an 8000 x 8000 covariance matrix? Probably
exceeds the number of stars in our galaxy...
Numerical issues may also play a role, but I am too ignorant on this
aspect to offer advice.
Finally, this is really not an R question, so you would probably do
better to post on a stats site like stats.stackexchange.com rather
than here.
-- Bert
On Sat, Aug 11, 2012 at 7:17 AM, Boel Brynedal <brynedal at gmail.com> wrote:
> Hi,
>
> I want to simulate a data set with similar covariance structure as my
> observed data, and have calculated a covariance matrix (dimensions
> 8368*8368). So far I've tried two approaches to simulating data:
> rmvnorm from the mvtnorm package, and by using the Cholesky
> decomposition (http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-normal-generation/).
> The problem is that the resulting covariance structure in my simulated
> data is very different from the original supplied covariance vector.
> Lets just look at some of the values:
>
>> cov8[1:4,1:4] # covariance of simulated data
> X1 X2 X3 X4
> X1 34515296.00 99956.69 369538.1 1749086.6
> X2 99956.69 34515296.00 2145289.9 -624961.1
> X3 369538.08 2145289.93 34515296.0 -163716.5
> X4 1749086.62 -624961.09 -163716.5 34515296.0
>> CEUcovar[1:4,1:4]
> [,1] [,2] [,3] [,4]
> [1,] 0.1873402987 0.001837229 0.0009009272 0.010324521
> [2,] 0.0018372286 0.188665853 0.0124216535 -0.001755035
> [3,] 0.0009009272 0.012421654 0.1867835412 -0.000142395
> [4,] 0.0103245214 -0.001755035 -0.0001423950 0.192883488
>
> So the distribution of the observed covariance is very narrow compared
> to the simulated data.
>
> None of the eigenvalues of the observed covariance matrix are
> negative, and it appears to be a positive definite matrix. Here is
> what I did to create the simulated data:
>
> Chol <- chol(CEUcovar)
> Z <- matrix(rnorm(20351 * 8368), 8368)
> X <- t(Chol) %*% Z
> sample8 <- data.frame(as.matrix(t(X)))
>> dim(sample8)
> [1] 20351 8368
> cov8=cov(sample8,method='spearman')
>
> [earlier I've also tried sample8 <- rmvnorm(1000,
> mean=rep(0,ncol(CEUcovar)), sigma=CEUcovar, method="eigen") with as
> 'bad' results, much larger covariance values in the simulated data ]
>
> Any ideas of WHY the simulated data have such a different covariance?
> Any experience with similar issues? Would be happy to supply the
> covariance matrix if anyone wants to give it a try.
> Any suggestions? Anything apparent that I left our or neglected?
>
> Any advice would be highly appreciated.
> Best,
> Bo
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Bert Gunter
Genentech Nonclinical Biostatistics
Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
More information about the R-help
mailing list