[R] Parsing

Paolo Sonego paolo.sonego at gmail.com
Wed Jul 9 11:33:28 CEST 2008


Dear R users,

I have a big text file formatted like this:

x      x_string
y      y_string
id1    id1_string
id2    id2_string
z      z_string
w      w_string
stuff  stuff  stuff
stuff  stuff  stuff
stuff  stuff  stuff
//
x      x_string1
y      y_string1
z      z_string1
w      w_string1
stuff  stuff  stuff
stuff  stuff  stuff
stuff  stuff  stuff
//
x      x_string2
y      y_string2
id1    id1_string1
id2    id2_string1
z      z_string2
w      w_string2
stuff  stuff  stuff
stuff  stuff  stuff
stuff  stuff  stuff
//
...
...


I'd like to parse this file and retrieve the x, y, id1, id2, z, w fields 
and save them into a a matrix object:

x        y          id1         id2         z          w
x_string y_string   id1_string  id2_string  z_string   w_string  
x_string1 y_string1 NA          NA          z_string1  w_string1
x_string2 y_string2 id1_string1 id2_string1 z_string2  w_string2
...
...

id1, id2 fields  are not always present within a section (the interval 
between x and the last stuff) and
I'd like to insert a NA when they are absent (see above) so that 
length(x)==length(y)==length(id1)==... .

Without the id1, id2 fields the task is easily solvable  importing the 
text file with readLines and retrieving the single fields with grep:

input = readLines("file.txt")
x = grep("^x\\s", input, value = T)
id1 = grep("^id1\\s", input, value = T)
...

I'd like to accomplish this task entirely in R (no SQL, no perl 
script),  possibly without using loops.

Any suggestions are quite welcome!

Regards,
Paolo



More information about the R-help mailing list