[R] Handling large dataset & dataframe
Liaw, Andy
andy_liaw at merck.com
Mon Apr 24 21:07:22 CEST 2006
Instead of reading the entire data in at once, you read a chunk at a time,
and compute X'X and X'y on that chunk, and accumulate (i.e., add) them.
There are examples in "S Programming", taken from independent replies by the
two authors to a post on S-news, if I remember correctly.
Andy
From: Sachin J
>
> Gabor:
>
> Can you elaborate more.
>
> Thanx
> Sachin
>
> Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> You just need the much smaller cross product matrix X'X and
> vector X'Y so you can build those up as you read the data in
> in chunks.
>
>
> On 4/24/06, Sachin J wrote:
> > Hi,
> >
> > I have a dataset consisting of 350,000 rows and 266 columns. Out of
> > 266 columns 250 are dummy variable columns. I am trying to
> read this
> > data set into R dataframe object but unable to do it due to memory
> > size limitations (object size created is too large to
> handle in R). Is
> > there a way to handle such a large dataset in R.
> >
> > My PC has 1GB of RAM, and 55 GB harddisk space running windows XP.
> >
> > Any pointers would be of great help.
> >
> > TIA
> > Sachin
> >
> >
> > ---------------------------------
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
>
>
>
> ---------------------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list