[R] Reading a specific column of a csv file in a loop
Juliet Hannah
juliet.hannah at gmail.com
Tue Nov 15 15:44:17 CET 2011
In the solution below, what is the advantage of using "0L".
M0 <- read.csv("M1.csv", nrows = 1)[0L, ]
Thanks!
2011/11/8 Gabor Grothendieck <ggrothendieck at gmail.com>:
> 2011/11/8 Sergio René Araujo Enciso <araujo.enciso at gmail.com>:
>> Dear all:
>>
>> I have two larges files with 2000 columns. For each file I am
>> performing a loop to extract the "i"th element of each file and create
>> a data frame with both "i"th elements in order to perform further
>> analysis. I am not extracting all the "i"th elements but only certain
>> which I am indicating on a vector called "d".
>>
>> See an example of my code below
>>
>> ### generate an example for the CSV files, the original files contain
>> more than 2000 columns, here for the sake of simplicity they have only
>> 10 columns
>> M1<-matrix(rnorm(1000), nrow=100, ncol=10,
>> dimnames=list(seq(1:100),letters[1:10]))
>> M2<-matrix(rnorm(1000), nrow=100, ncol=10,
>> dimnames=list(seq(1:100),letters[1:10]))
>> write.table(M1, file="M1.csv", sep=",")
>> write.table(M2, file="M2.csv", sep=",")
>>
>> ### the vector containing the "i" elements to be read
>> d<-c(1,4,7,8)
>> P1<-read.table("M1.csv", header=TRUE)
>> P2<-read.table("M1.csv", header=TRUE)
>> for (i in d) {
>> M<-data.frame(P1[i],P2[i])
>> rm(list=setdiff(ls(),"d"))
>> }
>>
>> As the files are quite large, I want to include "read.table" within
>> the loop so as it only read the "i"th element. I know that there is
>> the option "colClasses" for which I have to create a vector with zeros
>> for all the columns I do not want to load. Nonetheless I have no idea
>> how to make this vector to change in the loop, so as the only element
>> with no zeros is the "i"th element following the vector "d". Any ideas
>> how to do this? Or is there anz other approach to load only an
>> specific element?
>>
>
> Its a bit messy if there are row names so lets generate M1.csv like this:
>
> write.csv(M1, file = "M1.csv", row.names = FALSE)
>
> Then we can do this:
>
> nc <- ncol(read.csv("M1.csv", nrows = 1))
> colClasses <- replace(rep("NULL", nc), d, NA)
> M1.subset <- read.csv("M1.csv", colClasses = colClasses)
>
> or using the same M1.csv that we just generated try this which uses
> sqldf with the H2 backend:
>
> library(sqldf)
> library(RH2)
>
> M0 <- read.csv("M1.csv", nrows = 1)[0L, ]
> M1.subset.h2 <- sqldf(c("insert into M0 (select * from csvread('M1.csv'))",
> "select a, d, g, h from M0"))
>
> This is referred to as Alternative 3 in FAQ#10 Example 6a on the sqldf
> home page:
> http://sqldf.googlecode.com
> Alternative 1 and Alternative 2 listed there could also be tried.
>
> (Note that although sqldf has a read.csv.sql command we did not use it
> here since that command only works with the sqlite back end and the
> RSQLite driver has a max of 999 columns.)
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list