[R] read file part way through based on start and end date (first column)

Gabor Grothendieck ggrothendieck at gmail.com
Mon Mar 21 05:16:41 CET 2011


On Sun, Mar 20, 2011 at 3:47 PM, algotr8der <algotr8der at gmail.com> wrote:
> Hello folks - I have been trying to figure this out. I have a set of very
> large files that are of this format
>
> , , , ,
> 1/4/1999,9:31:00 AM,blah, blah, blah
> 1/4/1999,9:32:00 AM,blah, blah, blah
> 1/4/1999,9:33:00 AM,blah, blah, blah
>
> I want to write R code that reads only that data between a start and an end
> date (data is presented from oldest at the top of the file to the most
> recent at the bottom of the file). I'm not sure if there is an R function
> that makes this easy.
>
> I know the read.csv function enables you to skip a user specified number of
> rows before the file is read but this doesnt exactly help me as my start and
> end dates can be anywhere in between.
>

Try reading the entire file into R first to be really sure that you
are not just assuming it can't be done.

If its true that its too big to read it in and subset then try reading
just the first column of the file (read about the colClasses= argument
in ?read.table) and then figure out which rows you need from the first
column and re-read the file, this time using the skip= and nrows=
argument so that it only reads in the rows you need.


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list