[R] sparse matrix, rnorm, malloc
roger koenker
roger at ysidro.econ.uiuc.edu
Sun Jun 11 01:13:31 CEST 2006
As an example of how one might do this sort of thing in SparseM
ignoring the rounding aspect...
require(SparseM)
require(msm) #for rtnorm
sm <- function(dim,rnd,q){
n <- rbinom(1, dim * dim, 2 * pnorm(q) - 1)
ia <- sample(dim,n,replace = TRUE)
ja <- sample(dim,n,replace = TRUE)
ra <- rtnorm(n,lower = -q, upper = q)
A <- new("matrix.coo", ia = as.integer(ia), ja = as.integer
(ja), ra = ra, dimension = as.integer(c(dim,dim)))
A <- as.matrix.csr(A)
}
For dim = 5000 and q = .03 which exceeds Gavin's suggested 1 percent
density, this takes about 30 seconds on my imac and according to Rprof
about 95 percent of that (total) time is spent generating the
truncated normals.
Word of warning: pushing this too much further gets tedious since the
number of random numbers grows like dim^2. For example, dim = 20,000
and q = .02 takes 432 seconds with again 93% of the total time spent in
rnorm and rtnorm...
url: www.econ.uiuc.edu/~roger Roger Koenker
email rkoenker at uiuc.edu Department of Economics
vox: 217-333-4558 University of Illinois
fax: 217-244-6678 Champaign, IL 61820
On Jun 10, 2006, at 12:53 PM, g l wrote:
> Hi,
>
> I'm Sorry for any cross-posting. I've reviewed the archives and could
> not find an exact answer to my question below.
>
> I'm trying to generate very large sparse matrices (< 1% non-zero
> entries per row). I have a sparse matrix function below which works
> well until the row/col count exceeds 10,000. This is being run on a
> machine with 32G memory:
>
> sparse_matrix <- function(dims,rnd,p) {
> ptm <- proc.time()
> x <- round(rnorm(dims*dims),rnd)
> x[((abs(x) - p) < 0)] <- 0
> y <- matrix(x,nrow=dims,ncol=dims)
> proc.time() - ptm
> }
>
> When trying to generate the matrix around 20,000 rows/cols on a
> machine with 32G of memory, the error message I receive is:
>
> R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3)
> R(335) malloc: *** error: can't allocate region
> R(335) malloc: *** set a breakpoint in szone_error to debug
> R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3)
> R(335) malloc: *** error: can't allocate region
> R(335) malloc: *** set a breakpoint in szone_error to debug
> Error: cannot allocate vector of size 3125000 Kb
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Last error line is obvious. Question: on machine w/32G memory, why
> can't it allocate a vector of size 3125000 Kb?
>
> When trying to generate the matrix around 30,000 rows/cols, the error
> message I receive is:
>
> Error in rnorm(dims * dims) : cannot allocate vector of length
> 900000000
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Last error line is obvious. Question: is this 900000000 bytes?
> kilobytes? This error seems to be specific now to rnorm, but it
> doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000
> rows/cols. Even if this Mb, why can't this be allocated on a machine
> with 32G free memory?
>
> When trying to generate the matrix with over 50,000 rows/cols, the
> error message I receive is:
>
> Error in rnorm(n, mean, sd) : invalid arguments
> In addition: Warning message:
> NAs introduced by coercion
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Same.
>
> Why would it generate different errors in each case? Code fixes? Any
> simple ways to generate sparse matrices which would avoid above
> problems?
>
> Thanks in advance,
>
> Gavin
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-
> guide.html
More information about the R-help
mailing list