[R] expand.grid overflows?
Adrian Dusa
dusa.adrian at gmail.com
Sun Nov 18 14:31:22 CET 2007
On Friday 16 November 2007, francogrex wrote:
> >cbn<-as.matrix(expand.grid( rep( list(0:1), 50)))
>
> Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
> invalid 'times' value
> In addition: Warning message:
> In rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
> NAs introduced by coercion
>
> But I'm only interested in cbn matrix rows where:
> cbn<- cbn[rowSums(cbn)==5,]
>
> Is there a way to evaluate it row by row and only store where the sum is
> equal to 5, maybe it reduces cost of computation?
What you want is impossible: a matrix with all possible binary combinations of
50 columns is a matrix with 2^50x50 elements, which is:
> 2^50*50
[1] 5.6295e+16
By comparison, a matrix with 20 columns requires a space of 160MB, with 21
columns it needs approx. 330MB of RAM (see ?object.size) and everything goes
up exponentially at the powers of 2. There is simply no way you will ever
create a matrix with 50 columns.
There is a function in package QCA called createMatrix() that creates a
numerical matrix faster than expand.grid()
library(QCA)
cbn <- createMatrix(rep(2, 20))
# then what you want is
cbn <- cbn[rowSums(cbn) == 5, ]
For more than 20 variables it _is_ possible to get what you want sacrificing
speed for a low memory consumption, this way:
library(QCA)
nofcolumns <- 25
cbn.rownos <- seq(2^nofcolumns) # generate the row numbers
eq5 <- sapply(cbn.rownos, function(x) {
return(sum(getRow(rep(2, nofcolumns), x)) == 5)
})
# this will be _very_ slow, as it checks each row number (in its binary
# equivalent, see ?getRow) if it's sum is equal to 5
# then what you want is:
cbn <- getRow(rep(2, nofcolumns), cbn.rownos[eq5])
I hope it helps,
Adrian
--
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd
050025 Bucharest sector 5
Romania
Tel./Fax: +40 21 3126618 \
+40 21 3120210 / int.101
More information about the R-help
mailing list