[R] R help

Bert Gunter gunter.berton at gene.com
Sun Feb 19 21:55:59 CET 2012


Folks:

Perhaps I am naive, ignorant, or foolish, but this whole discussion
seems rather ridiculous. What possible relation to reality could a
multivariate normal of the size requested have? Even for simulation,
it seems like nonsense.

Cheers,
Bert

On Sun, Feb 19, 2012 at 11:35 AM, Petr Savicky <savicky at cs.cas.cz> wrote:
> On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
>> Dear all,
>>   I need to generate numbers from multivariate normal with large dimensions
>> (5,000,000).
>
> Hi.
>
> I am replying to your first email, since the other did not arrive
> to my folder, possibly filtered out by a spam filter. I see them
> at the web interface.
>
> 1. Error: cannot allocate vector of size 381.5 Mb
>
> The error message makes sense. The matrix requires m*n*8/2^20 MB,
> which is in your case
>
>  m <- 100000
>  n <- 500
>  m*n*8/2^20
>
>  [1] 381.4697
>
> May be, you already have other large objects in the memory. Try to
> minimize the number and size of objects, which you need simultaneously
> in an R session.
>
> 2. Generating a multivariate normal distribution.
>
> As Peter Dalgaard pointed out, a speed up is possible only
> for special types of the covariance matrix Sigma. A general
> way is to find a matrix A such that A A^t = Sigma. Then,
> the vector A X, where X is from N(0,I) and I is an identity
> matrix of an appropriate dimension, has covariance Sigma.
> This is also the way, how mvtnorm package works.
>
> A speed up is possible, if computing the product A X does not
> require to have matrix A explicitly represented in memory.
>
> The matrix A need not be a square matrix. In particular, the
> previous case may be understood as using the matrix A, which
> for a small m is as follows.
>
>  m <- 5
>  rho <- 0.5
>  A <- cbind(sqrt(rho), sqrt(1 - rho)*diag(m))
>  A
>
>
>            [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
>  [1,] 0.7071068 0.7071068 0.0000000 0.0000000 0.0000000 0.0000000
>  [2,] 0.7071068 0.0000000 0.7071068 0.0000000 0.0000000 0.0000000
>  [3,] 0.7071068 0.0000000 0.0000000 0.7071068 0.0000000 0.0000000
>  [4,] 0.7071068 0.0000000 0.0000000 0.0000000 0.7071068 0.0000000
>  [5,] 0.7071068 0.0000000 0.0000000 0.0000000 0.0000000 0.7071068
>
>  A %*% t(A)
>
>       [,1] [,2] [,3] [,4] [,5]
>  [1,]  1.0  0.5  0.5  0.5  0.5
>  [2,]  0.5  1.0  0.5  0.5  0.5
>  [3,]  0.5  0.5  1.0  0.5  0.5
>  [4,]  0.5  0.5  0.5  1.0  0.5
>  [5,]  0.5  0.5  0.5  0.5  1.0
>
> This construction is conceptually possible also for a large m because
> the structure of A allows to compute A X by simpler operations with
> the vector X than an explicit matrix product. Namely, the expression
>
>  rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho))
>
> or, more clearly,
>
>  sqrt(rho) * rnorm(1) + sqrt(1 - rho) * rnorm(m)
>
> is equivalent to the required A X, where X consists of rnorm(1)
> and rnorm(m) together.
>
> If you have a specific Sigma, describe it and we can discuss,
> whether an appropriate A can be found.
>
> Hope this helps.
>
> Petr Savicky.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list