[R] Speeding reading of a large file
Rui Barradas
ruipbarradas at sapo.pt
Thu Dec 6 17:53:17 CET 2012
Hello,
Because x[] keeps the dimensions, unlike just x.
Hope this helps,
Rui Barradas
Em 06-12-2012 16:24, Juliet Hannah escreveu:
> All,
>
> Can someone describe what
>
> x[] <- lapply(x, as.numeric)
>
> I see that it is putting the list elements into a data frame. The
> results for lapply are a list, so how does this become
> a data frame.
>
> Thanks,
>
> Juliet
>
>
> On Mon, Dec 3, 2012 at 5:49 PM, Fisher Dennis <fisher at plessthan.com> wrote:
>> Colleagues,
>>
>> This past week, I asked the following question:
>>
>> I have a file that looks that this:
>>
>> TABLE NO. 1
>> PTID TIME AMT FORM PERIOD IPRED CWRES EVID CP PRED RES WRES
>> 2.0010E+03 3.9375E-01 5.0000E+03 2.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 1.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
>> 2.0010E+03 8.9583E-01 5.0000E+03 2.0000E+00 0.0000E+00 3.3389E+00 0.0000E+00 1.0000E+00 0.0000E+00 3.5321E+00 0.0000E+00 0.0000E+00
>> 2.0010E+03 1.4583E+00 5.0000E+03 2.0000E+00 0.0000E+00 5.8164E+00 0.0000E+00 1.0000E+00 0.0000E+00 5.9300E+00 0.0000E+00 0.0000E+00
>> 2.0010E+03 1.9167E+00 5.0000E+03 2.0000E+00 0.0000E+00 8.3633E+00 0.0000E+00 1.0000E+00 0.0000E+00 8.7011E+00 0.0000E+00 0.0000E+00
>> 2.0010E+03 2.4167E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.0092E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.0324E+01 0.0000E+00 0.0000E+00
>> 2.0010E+03 2.9375E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.1490E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.1688E+01 0.0000E+00 0.0000E+00
>> 2.0010E+03 3.4167E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.2940E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.3236E+01 0.0000E+00 0.0000E+00
>> 2.0010E+03 4.4583E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.1267E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.1324E+01 0.0000E+00 0.0000E+00
>>
>> The file is reasonably large (> 10^6 lines) and the two line header is repeated periodically in the file.
>> I need to read this file in as a data frame. Note that the number of columns, the column headers, and the number of replicates of the headers are not known in advance.
>>
>> I received a number of replies, many of them quite useful. Of these, one beat out all the others in my benchmarking using files ranging from 10^5 to 10^6 lines.
>> That version, provided by Jim Holtman, was:
>> x <- read.table(FILE, as.is = TRUE, skip=1, fill=TRUE, header = TRUE)
>> x[] <- lapply(x, as.numeric)
>> x <- x[!is.na(x[,1]), ]
>>
>> Other versions involved readLines, following by edits, following by cat (or write) to a temp file, then read.table again.
>> The overhead with invoking readLines, write/cat, and read.table was substantially larger than the strategy of read.table / as.numeric / indexing
>>
>> Thanks for the input from many folks.
>>
>> Dennis
>>
>> Dennis Fisher MD
>> P < (The "P Less Than" Company)
>> Phone: 1-866-PLessThan (1-866-753-7784)
>> Fax: 1-866-PLessThan (1-866-753-7784)
>> www.PLessThan.com
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list