[R] Merging big data sets
Jeff Newmiller
jdnewmil at dcn.davis.CA.us
Mon Sep 9 17:48:59 CEST 2013
Please don't post in HTML. (Read the Posting Guide.)
Consider using the sqldf package.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Renger van Nieuwkoop <renger at vannieuwkoop.ch> wrote:
>Hi
>I have 6 rather big data sets (between 400000 and 800000 lines) on
>transport data (times, distances and travelers between nodes). They all
>have a common index (start-end nodes).
>I want to aggregate this data, but for that I have to merge them.
>I tried to use "merge" with the result that R (3.0.1) crashes (Windows
>8 machine, 16 Gb Ram).
>Then I tried the join from the data.table package. Here I got the
>message that 2^34 is too big (no idea why it is 2^34 as it is a left
>join).
>Then I decided to do a loop using the tables and assigning them, which
>takes a very, very long time (still running at the moment).
>
>Here is the code:
>for (i in 1:length(dataP$Start)){
> c<-dataP$Start[i]
> d<-dataP$End[i]
> dataP[J(c,d)]$OEV.T<-ttoevP[J(c,d)]$OEV.T
>}
>
>dataP has 800'000 lines and ttoevP has about 500'000 lines.
>
>Any hints to speed up this process are welcome.
>
>Renger
>_________________________________________
>Centre of Economic Research (CER-ETH)
>Z�richbergstrasse 18 (ZUE)
>CH - 8032 Z�rich
>+41 44 632 02 63
>mailto: rengerv at etzh.ch<mailto:rengerv at etzh.ch>
>blog.modelworks.ch
>
>
> [[alternative HTML version deleted]]
>
>
>
>------------------------------------------------------------------------
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list