[R] sequential processing
bogdan romocea
br44114 at gmail.com
Mon Jan 22 22:16:28 CET 2007
One option for processing very large files with R is split:
## split a large file into pieces
#--parameters: the folder, file and number of parts
FLD=/home/user/data
F=very_large_file.dat
parts=50
#---split
cd $FLD
fn=`echo $F | awk -F\. '{print $1}'` #file name without extension
ln=`wc -l $F | awk '{print $1}'` #number of lines in the file
forsplit=`expr $ln / $parts + 1` #number of lines in each part
echo "====== $F will be split in $parts parts of $forsplit lines each."
split -l $forsplit $F $fn
You could also load the entire file into a DBMS then pull parts of it
into R, or read specific lines through a pipe e.g.
readLines(pipe("sed, grep, python... command")).
Don't try to replicate the SAS processing into R. The exact
translations of the SAS DATA STEP usage of _N_, first., last., retain
etc into R would be: inefficient, ugly, retrogressive, wrong, rigid,
complicated, silly and so on. For a start, read up on indexing - this
seemingly simple and innocuous R feature is in fact far more powerful
than the entire DATA STEP with its whole bag of tricks. Then search
the list for similar questions, for example
http://thread.gmane.org/gmane.comp.lang.r.general/44332/focus=44343
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Gerard Smits
> Sent: Sunday, January 21, 2007 2:22 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] sequential processing
>
> Like many others, I am new to R but old to SAS.
>
> Am I correct in understanding that R processes a data frame in a
> sequential ly? This would imply that large input files could be
> read, without the need to load the entire file into memory.
> Related to the manner of reading a frame, I have been looking for the
> equivalent of SAS _n_ (I realize that I can use a variant of which to
> identify an index value) as well as useful SAS features such as
> first., last., retain, etc. Any help with this conversion
> appreciated.
>
> Thanks,
>
> Gerard Smits
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list