[R] big panel: filehash, bigmemory or other
Eric Fail
e at it.dk
Mon Feb 22 23:13:00 CET 2010
Dear R-list
I'm on my way to start a new project on a rather big panel, consisting
of approximately 8 million observations in 30 waves of data and about
15 variables. I have a similar data set that is approximately 7
gigabytes in size.
Until now I have done my data management in SAS, and Stata, mostly
identifying spells, counting events in intervals, and a like, but I
would like to do the data management-and fitting my models-in R.
Though R can't handle the data in a normal R-way, it's simply too big.
So I thought of trying either filehash, bigmemory or some other
similar package I haven't heard of (yet). In the documentation to
'bigmemory' is says that the package is capable of ``basic
manipulation '' on ``manageable subsets of the data '', but what does
that actually mean?
Since learning this in R is a rather time consuming process, and I
know SAS is capable of doing the data management, and have the proc
mixed module, I wanted to ask on the list, before I set out on this
odyssey.
Does anyone out there have any practical experience with data sets
(panels) that size and maybe some experience fitting a model,
presumably using the lmer package or alike, using filehash or
bigmemory, that they would be willing to share?
Thanks in advance,
Eric
More information about the R-help
mailing list