[R] proper use of textConnection

Gabor Grothendieck ggrothendieck at gmail.com
Sun Oct 12 17:55:32 CEST 2008


Try one of these:


Lines <- readLines("myfile.dat")
Lines <- Lines[-grep("whatever", Lines)]
DF <- read.table(textConnection(Lines), ...other.args...)

or

# use findstr /v instead of grep -v if you are on Windows
DF <- read.table(pipe("grep -v whatever myfile.dat"), ...other.args...)


On Sun, Oct 12, 2008 at 11:13 AM, Dennis Fisher <fisher at plessthan.com> wrote:
> Colleagues,
>
> Using R2.7.0 in OS X, I am having trouble understanding the command
> textConnection.  My situation is as follows:
> 1.  I am trying to read a lengthy file (45000 lines) that has headers
> ~ every 1000 lines.  read.table (or its variants) fail because of the
> recurrent headers.
> 2.  My present approach is the following:
>        a.  use readLines to read the file, save as an array
>        b.  use grep to find the recurrent headers (not including the first
> set)
>        c.  delete the recurrent headers from the array
>        d.  write the array to a temp file
>        e.  read the temp file using read.table
>        f.   delete the temp file
> 3.  My understanding is to textConnection might enable me to replace
> steps d-f with a single step akin to
> read.table(textConnection(array)).  This appears to work but it is
> very slow.  I executed code on successively larger chunks of the array:
> for (Each in 1000 * 1:45)
>        {
>        cat("N lines =", Each, "\t", date(), "\n")
>        A <- read.table(textConnection(Z[1:Each]), header=T)
>        }
> yielding:
> N lines = 1000   Sun Oct 12 07:09:48 2008
> N lines = 2000   Sun Oct 12 07:09:48 2008
> N lines = 3000   Sun Oct 12 07:09:48 2008
> N lines = 4000   Sun Oct 12 07:09:50 2008
> N lines = 5000   Sun Oct 12 07:09:52 2008
> N lines = 6000   Sun Oct 12 07:09:56 2008
> N lines = 7000   Sun Oct 12 07:10:01 2008
> N lines = 8000   Sun Oct 12 07:10:09 2008
> N lines = 9000   Sun Oct 12 07:10:18 2008
> N lines = 10000          Sun Oct 12 07:10:31 2008
> N lines = 11000          Sun Oct 12 07:10:46 2008
> N lines = 12000          Sun Oct 12 07:11:04 2008
> N lines = 13000          Sun Oct 12 07:11:25 2008
> N lines = 14000          Sun Oct 12 07:11:51 2008
> N lines = 15000          Sun Oct 12 07:12:20 2008
> N lines = 16000          Sun Oct 12 07:12:54 2008
> N lines = 17000          Sun Oct 12 07:13:32 2008
> N lines = 18000          Sun Oct 12 07:14:16 2008
> N lines = 19000          Sun Oct 12 07:15:04 2008
> N lines = 20000          Sun Oct 12 07:15:58 2008
> N lines = 21000          Sun Oct 12 07:16:58 2008
> N lines = 22000          Sun Oct 12 07:18:04 2008
> N lines = 23000          Sun Oct 12 07:19:17 2008
> N lines = 24000          Sun Oct 12 07:20:36 2008
> N lines = 25000          Sun Oct 12 07:22:02 2008
> N lines = 26000          Sun Oct 12 07:23:36 2008
>
> Any clever ideas will be greatly appreciated.
>
> Dennis
>
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-415-564-2220
> www.PLessThan.com
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list