[R] Reading mcmc/coda into a big.matrix efficiently
Guy W Cole
gwc2124 at columbia.edu
Mon Jan 2 02:37:57 CET 2012
I'm trying to read CODA/mcmc files (see the coda package), as
generated by jags/WinBUGS/OpenBUGS, into a big.matrix. I can't load
the whole mcmc object produced by read.coda() into memory since I'm
using a laptop for this analysis (currently I'm unfunded).
Right now I'm doing it by creating the filebacked.big.matrix, reading
a chunk of data at a time from the chain file using read.table() with
"skip" and "nrows" set, and storing it into the big.matrix. While
this is memory efficient, the processing overhead seems be related to
the size of the skip value, so that the time required is proportionate
to the number of variables.
Any tips on how to do this faster / more efficiently? I'm using a
unix system, so a solution that uses grep/sed
Here's some sample code of how I do it now:
index = read.table("Big.CODAindex.txt", col.names =
c("var","start","end"))
n = index[1,3] - index[1,2] + 1
k = dim(index)[1]
X = filebacked.big.matrix( nrow = n, ncol = k, backingfile =
"Big.CODA.backing")
for(i in 1:k) { X[,i] = read.table("Big.CODAchain1.txt", skip =
(i-1)*n, nrows = n)[,2]
print(i)
print(Sys.time())
}
Also, here are the first few rows of the index and chain files, so you
can see the formatting. The index file tells you each variable's name
and the range or rows in the chain file containing the variable's
values. The chain file contains the iteration number the value was
taken from, and
CODAindex.txt
egu[1] 1 10000
egu[2] 10001 20000
egt[1] 20001 30000
egt[2] 30001 40000
ept[1] 40001 50000
ept[2] 50001 60000
...
CODAchain1.txt
10001 -0.289963
10011 -0.310657
10021 -0.290596
10031 -0.286273
10041 -0.319877
10051 -0.299019
....
Thanks in advance for any tips!
--Guy W. Cole
R version 2.14.0 (2011-10-31) x86_64-apple-darwin9.8.0
More information about the R-help
mailing list