[R] seemingly simple read.table question

Sun Jun 11 23:57:25 CEST 2006

On 11 June 2006 at 16:24, markleeds at verizon.net wrote:
| I have a file that I thought would be fairly simple to read in using read.table but I am having problems ( as usual ).
| 
| each line of the file is of the form ( just 20 lines or so )
| 
| financials XXX, YYY, ZZZ
| automobiles RTR, ABC, TGH
| 
| so the first field in the line is the industry and the other fields
| ( seperated by commas ) in the line are stock identifiers of stocks
| in that industry. note that there is no comma between the industry
| and the first stock identifier in the group which i guess might
| complicate things ?

Yup, because that makes it such that the comma is no longer a unique
seperator between _all_ column.  But if the file really looks the way you
typed it here, you should be fine by postprocessing the data afterwards and
just removing the comma. See below for a hack-ish solution.

| my goal is to make the row names the industries and the stock
| identifiers the column data but i don't
| have a header so , i am unclear ( reading the help on
| read.table ) how to tell R that the first field in each line
| should be used as the row name ? Thanks for any help
| or for telling me tht this is not possible. This will be my last bother of the day to the help group.

This is a little clumsy, using an apply to sweep a regexp transformation 
[ hey, you get to use what we taught you earlier :) ] through.  

> rawData <- read.table("/tmp/leeds.txt", row.names=1)
> data <- apply(rawData, 2, function(X)gsub(",$", "", X))
> rownames(data) <- rownames(rawData)
> data
            V2    V3    V4   
financials  "XXX" "YYY" "ZZZ"
automobiles "RTR" "ABC" "TGH"
> 

I'm sure someone named Gabor will soon post something doing the same in half
the lines ...

Dirk

-- 
Hell, there are no rules here - we're trying to accomplish something. 
                                                  -- Thomas A. Edison