[R] proper use of textConnection
Gabor Grothendieck
ggrothendieck at gmail.com
Sun Oct 12 17:55:32 CEST 2008
Try one of these:
Lines <- readLines("myfile.dat")
Lines <- Lines[-grep("whatever", Lines)]
DF <- read.table(textConnection(Lines), ...other.args...)
or
# use findstr /v instead of grep -v if you are on Windows
DF <- read.table(pipe("grep -v whatever myfile.dat"), ...other.args...)
On Sun, Oct 12, 2008 at 11:13 AM, Dennis Fisher <fisher at plessthan.com> wrote:
> Colleagues,
>
> Using R2.7.0 in OS X, I am having trouble understanding the command
> textConnection. My situation is as follows:
> 1. I am trying to read a lengthy file (45000 lines) that has headers
> ~ every 1000 lines. read.table (or its variants) fail because of the
> recurrent headers.
> 2. My present approach is the following:
> a. use readLines to read the file, save as an array
> b. use grep to find the recurrent headers (not including the first
> set)
> c. delete the recurrent headers from the array
> d. write the array to a temp file
> e. read the temp file using read.table
> f. delete the temp file
> 3. My understanding is to textConnection might enable me to replace
> steps d-f with a single step akin to
> read.table(textConnection(array)). This appears to work but it is
> very slow. I executed code on successively larger chunks of the array:
> for (Each in 1000 * 1:45)
> {
> cat("N lines =", Each, "\t", date(), "\n")
> A <- read.table(textConnection(Z[1:Each]), header=T)
> }
> yielding:
> N lines = 1000 Sun Oct 12 07:09:48 2008
> N lines = 2000 Sun Oct 12 07:09:48 2008
> N lines = 3000 Sun Oct 12 07:09:48 2008
> N lines = 4000 Sun Oct 12 07:09:50 2008
> N lines = 5000 Sun Oct 12 07:09:52 2008
> N lines = 6000 Sun Oct 12 07:09:56 2008
> N lines = 7000 Sun Oct 12 07:10:01 2008
> N lines = 8000 Sun Oct 12 07:10:09 2008
> N lines = 9000 Sun Oct 12 07:10:18 2008
> N lines = 10000 Sun Oct 12 07:10:31 2008
> N lines = 11000 Sun Oct 12 07:10:46 2008
> N lines = 12000 Sun Oct 12 07:11:04 2008
> N lines = 13000 Sun Oct 12 07:11:25 2008
> N lines = 14000 Sun Oct 12 07:11:51 2008
> N lines = 15000 Sun Oct 12 07:12:20 2008
> N lines = 16000 Sun Oct 12 07:12:54 2008
> N lines = 17000 Sun Oct 12 07:13:32 2008
> N lines = 18000 Sun Oct 12 07:14:16 2008
> N lines = 19000 Sun Oct 12 07:15:04 2008
> N lines = 20000 Sun Oct 12 07:15:58 2008
> N lines = 21000 Sun Oct 12 07:16:58 2008
> N lines = 22000 Sun Oct 12 07:18:04 2008
> N lines = 23000 Sun Oct 12 07:19:17 2008
> N lines = 24000 Sun Oct 12 07:20:36 2008
> N lines = 25000 Sun Oct 12 07:22:02 2008
> N lines = 26000 Sun Oct 12 07:23:36 2008
>
> Any clever ideas will be greatly appreciated.
>
> Dennis
>
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-415-564-2220
> www.PLessThan.com
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list