[R] placing multiple rows in a single row

Mon Jul 4 21:00:47 CEST 2011

On Jul 4, 2011, at 2:32 PM, Annemarie Verkerk wrote:

> Dear people from the R help list,
>
> I have a question that I can't get my head around to start  
> answering, that is why I am writing to the list.
>
> I have data in a format like this (tabs might look weird):
>
> John     A1     1     0     1
> John     A2     1    1    1
> John     A3     1    0    0
> Mary    A1     1     0     1
> Mary     A2     0    0    1
> Mary     A3    1    1    0
> Peter     A1     1    0    0
> Peter     A2     0    0    1
> Peter     A3     1    1    1
> Josh     A1     1     0    0
> Josh     A2
> Josh     A3    0    0    0
>
> I want to convert it into a format where variable rows from a single  
> subject are placed behind each other, but with the different scores  
> still matching up (i.e., it needs to be able to cope with missing  
> data, as for Josh's A2 score).
>
> John     A1     1     0     1     A2     1    1    1     A3     1     
> 0    0
> Mary    A1     1     0     1    A2     0    0    1     A3    1     
> 1    0
> Peter     A1     1    0    0     A2     0    0    1     A3     1     
> 1    1
> Josh     A1     1     0    0      A2                A3    0    0    0
>
> Preferably, the row identification would become the header of the  
> new table, something like this:
>
>       A11    A12    A13 A21    A22    A23    A31    A32    A33
> John      1     0     1      1    1    1      1    0    0
> Mary     1     0     1     0    0    1     1    1    0
> Peter      1    0    0      0    0    1      1    1    1
> Josh      1     0    0                  0    0    0
>
> Probably, this has been addressed before - I just don't know how to  
> search for the answer with the right search terms.
>
> Any help is appreciated, even just a link to a page where this is  
> addressed!

There is a reshape function in the stats package that nobody except  
Phil Spector seems to understand and then there is the reshape and  
reshape2 packages that everybody seems to get. (I don't understand why  
the classification variables are on the left-hand-side, though.  
Positionally it makes some sense, but logically it does not connect  
with how I understand the process.)

require(reshape2)
# entered your data with default names V1 V2 V3 V4 V5
 > nam123
       V1 V2 V3 V4 V5
1   John A1  1  0  1
2   John A2  1  1  1
3   John A3  1  0  0
4   Mary A1  1  0  1
5   Mary A2  0  0  1
6   Mary A3  1  1  0
7  Peter A1  1  0  0
8  Peter A2  0  0  1
9  Peter A3  1  1  1
10  Josh A1  1  0  0
11  Josh A2 NA NA NA
12  Josh A3  0  0  0

 > nams.mlt <- melt(nam123, idvars=c("V1", "V2"))

 > str(nams.mlt)
'data.frame':	36 obs. of  4 variables:
  $ V1      : Factor w/ 4 levels "John","Josh",..: 1 1 1 3 3 3 4 4 4  
2 ...
  $ V2      : Factor w/ 3 levels "A1","A2","A3": 1 2 3 1 2 3 1 2 3 1 ...
  $ variable: Factor w/ 3 levels "V3","V4","V5": 1 1 1 1 1 1 1 1 1 1 ...
  $ value   : int  1 1 1 1 0 1 1 0 1 1 ...

 > dcast(nams.mlt, V1+V2 ~ variable)
       V1 V2 V3 V4 V5
1   John A1  1  0  1
2   John A2  1  1  1
3   John A3  1  0  0
4   Josh A1  1  0  0
5   Josh A2 NA NA NA
6   Josh A3  0  0  0
7   Mary A1  1  0  1
8   Mary A2  0  0  1
9   Mary A3  1  1  0
10 Peter A1  1  0  0
11 Peter A2  0  0  1
12 Peter A3  1  1  1
 > dcast(nams.mlt, V1 ~ V2+variable)
      V1 A1_V3 A1_V4 A1_V5 A2_V3 A2_V4 A2_V5 A3_V3 A3_V4 A3_V5
1  John     1     0     1     1     1     1     1     0     0
2  Josh     1     0     0    NA    NA    NA     0     0     0
3  Mary     1     0     1     0     0     1     1     1     0
4 Peter     1     0     0     0     0     1     1     1     1

You can always change the names of the dataframe if you want, and in  
this case it would be a simple sub() operation. Personally I would  
substitute "." rather than "".
-- 

David Winsemius, MD
West Hartford, CT