[R] Large Stata file Import in R
Xavier
xfim.ll at gmail.com
Tue Jul 7 11:40:25 CEST 2009
Thomas Lumley vas escriure el dia dt, 30 jun 2009:
> On Tue, 30 Jun 2009, Xavier wrote:
>
>> saurav pathak vas escriure el dia dl, 29 jun 2009:
>>
>>> Hi
>>>
>>> I am using Stata 10 and I need to import a data set in stata 10 to R, I
>>> have
>>> saved the dataset in lower versions of Stata as well by using saveold
>>> command in Stata.
>>>
>>> My RAM is 4gb and the stata file is 600MB, I am getting an error message
>>> which says :
>>>
>>> "Error: cannot allocate vector of size 3.4 Mb
>>> In addition: There were 50 or more warnings (use warnings() to see the
>>> first
>>> 50)"
>>>
>>> Thus far I have already tried the following
>>
>> Maybe it does not adress the R problem that you are asking for, but you
>> can
>> try to "compress" the stata file prior to save it. And maybe the size of
>> the file will decrease.
>>
>
> This can't possibly help. The problem is that *R* is running out of
> memory, and storing the data elements in less space *on disk* won't help
> with the space used in memory. Stata's -compress- option just chooses
> smaller data types, eg, byte instead of integer.
I have done a small test and it seems that it can help.
I have a big dataset in stata (big) to which I apply the "compress" command
(in Stata), getting a small file. Those are the sizes in stata:
-----8<---------------
# original data size in stata
Contains data from G:\tmp\example-big.dta
obs: 52,547 Written by R.
vars: 54
size: 21,807,005 (96.4% of memory free)
# data size once "compress" has been used
Contains data from example-small.dta
obs: 52,547 Written by R.
vars: 54 3 Jul 2009 15:27
size: 17,918,527 (97.1% of memory free)
-----8<---------------
And when loaded into R:
-----8<---------------
> library(foreign)
> big <- read.dta("example-big.dta")
> small <- read.dta("example-small.dta")
> object.size(big)
20819600 bytes
> object.size(small)
19558520 bytes
-----8<---------------
Maybe the difference once objects are stored in memory is not as big as it
is when stored in disk, but it seems a good idea to compress data in stata
prior to load it into R, if memory is a problem.
--
- Xavier -
More information about the R-help
mailing list