[R] read multiple large files into one dataframe

Liaw, Andy andy_liaw at merck.com
Wed May 13 17:00:00 CEST 2009


A few points to consider:

- If all the data are numeric, then use matrices instead of data frames.

- With either data frames or matrices, there is no way (that I'm aware
of anyway) in R to stack them without making at least one copy in
memory.

- Since none of the files has a header row, I would concatenate them
into one file outside R (e.g., on *nix, cat * > all.txt) and then read
that in.  You can also try it inside R with something like
read.table(pipe()).  You will want to make use of the colClasses
argument in read.table() to specify the column types, though, to ensure
that read.table() only go through the input once.

- You're probably better off getting the data into a database (even
something like sqlite) and use an R interface to that database.

- 30MB x 90 = 2.7GB.  Unless you're on a 64-bit machine with lots of
RAM, you're not likely to have much fun with the data even when you
manage to get it into R in one piece.

Andy

From: SYKES, Jennifer
> 
> Hello
> 
>  
> 
> Apologies if this is a simple question, I have searched the help and
> have not managed to work out a solution.
> 
> Does anybody know an efficient method for reading many text 
> files of the
> same format into one table/dataframe?
> 
>  
> 
> I have around 90 files that contain continuous data over 3 months but
> that are split into individual days data and I need the whole 3 months
> in one file for analysis.  Each days file contains a large amount of
> data (approx 30MB each) and so I need a memory efficient 
> method to merge
> all of the files into the one dataframe object.  From what I 
> have read I
> will probably want to avoid using for loops etc?  All files are in the
> same directory, none have a header row, and each contain 
> around 180,000
> rows and the same 25 columns/variables.  Any suggested 
> packages/routines
> would be very useful.
> 
>  
> 
> Thanks
> 
>  
> 
> Jennifer
> 
>  
> 
>  
> 
> 
> 
> -----------------------------------------
> *******************************************************************If
> you are not the intended recipient, please notify our Help Desk at
> Email postmaster at nats.co.uk immediately. You should not copy or use
> this email or attachment(s) for any purpose nor disclose their
> contents to any other person. NATS computer systems may be
> monitored and communications carried on them recorded, to secure
> the effective operation of the system and for other lawful
> purposes. Please note that neither NATS nor the sender accepts any
> responsibility for viruses or any losses caused as a result of
> viruses and it is your responsibility to scan or otherwise check
> this email and any attachments. NATS means NATS (En Route) plc
> (company number: 4129273), NATS (Services) Ltd (company number
> 4129270), NATSNAV Ltd (company number: 4164590) or NATS Ltd
> (company number 3155567) or NATS Holdings Ltd (company number
> 4138218). All companies are registered in England and their
> registered office is at 5th Floor, Brettenham House South,
> Lancaster Place, London, WC2E 7EN.
> **********************************************************************
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:12}}




More information about the R-help mailing list