[R] Conditional read-in of data

mnstn pavan.namd at gmail.com
Wed Nov 4 16:01:25 CET 2009


Hello Jim and Gabor,
Thanks for your inputs. The lines:

a<-as.matrix(read.table(pipe("awk -f cut.awk Data.file")))
cut.awk>{for(i = 1; i <= NF; i=i+10) print $i,""}

solved my problem. I know that 40k lines is not a large data set. I have
about 150 files each of which has 40k rows and in each file I wanted to
visualize (basically to ensure nothing odd is going on) how the data behaves
in each quarter of the data w/o making 150 figures/pdf files. In future as
my data size increases I will consider using relational databases.

Thanks again,
mnstn


Gabor Grothendieck wrote:
> 
> 1. You can pipe your data through gawk (or other scripting language)
> process as in:
> http://tolstoy.newcastle.edu.au/R/e5/help/08/09/2129.html
> 
> 2. read.csv.sql in the sqldf package on CRAN will set up a database
> for you, read the file into the database automatically defining the
> layout of the table, extract a portion into R based on an sql
> statement that you provide and then destroy the database all in one
> statement.  It uses the sqlite database which is included in the
> RSQLite R package that it depends on so there is nothing to separately
> install.
> See ?read.csv.sql in the package and also see example 13 on the home page:
> http://sqldf.googlecode.com
> 
> 
> On Wed, Nov 4, 2009 at 12:07 AM, mnstn <pavan.namd at gmail.com> wrote:
>>
>> Hello All,
>> I have a 40k rows long data set that is taking a lot of time to be
>> read-in.
>> Is there a way to skip reading even/odd numbered rows or read-in only
>> rows
>> that are multiples of, say, 10? This way I get the general trend of the
>> data
>> w/o actually reading the entire thing. The option 'skip' in read.table
>> simply skips the first n rows and reads the rest. I do understand that
>> once
>> the full data set (40k rows) is read-in, I can manipulate the data. But
>> the
>> bottle-neck here is the first read/scan of data.
>>
>> I searched in the forum using key words (conditional skip/skip reading
>> rows/skip data/conditional data read) etc. but couldn't find relevant
>> conversations. I apologize if this has already been discussed since it
>> does
>> seem hard to imagine that nobody has come across this problem yet.
>>
>> Any suggestions/comments are welcome.
>> Thanks,
>> mnstn
>> --
>> View this message in context:
>> http://old.nabble.com/Conditional-read-in-of-data-tp26191091p26191091.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://old.nabble.com/Conditional-read-in-of-data-tp26191091p26197793.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list