[R] Reshaping matrix of lists as dataframe

Oliver Gondring olihui at gmx.de
Mon Feb 1 08:58:41 CET 2010


Hello William, hello David,

  thanks a lot for helping and keeping me going on what sometimes seems 
to be a long way to R mastery! :)

I found that the two solutions William proposed were in fact easier to 
understand for me at the moment as David's (and has the additional 
advantage of producing the desired data types ('numeric'/'integer') in 
the columns 2-5), however I think all of the code you provided will be 
extremely helpful to learn some new tricks by analyzing it in detail.

For everyone concerned with similar data manipulation tasks, here's a 
short summary of the thread:

 >>> The original data (a matrix of _lists_, of cours - mea culpa - 
hence the modified name of the thread):

x <- list(c(1,2,4),c(1,3,5),c(0,1,0),
          c(1,3,6,5),c(3,4,4,4),c(0,1,0,1),
          c(3,7),c(1,2),c(0,1))
data <- matrix(x,byrow=TRUE,nrow=3)
colnames(data) <- c("First", "Length", "Value")
rownames(data) <- c("Case1", "Case2", "Case3")

 > data
      First     Length    Value
Case1 Numeric,3 Numeric,3 Numeric,3
Case2 Numeric,4 Numeric,4 Numeric,4
Case3 Numeric,2 Numeric,2 Numeric,2


 >>> The desired output (a dataframe of a database-like 'flat' structure):

 >      Case Sequence First Length Value
 >   1 Case1        1     1      1     0
 >   2 Case1        2     2      3     1
 >   3 Case1        3     4      5     0
 >   4 Case2        1     1      3     0
 >   5 Case2        2     3      4     1
 >   6 Case2        3     6      4     0
 >   7 Case2        4     5      4     1
 >   8 Case3        1     3      1     0
 >   9 Case3        2     7      2     1


 >>> Ways to do it:

(1)
 >  lengths<-sapply(data[,1],length)
 >  data.frame(Case=rep(rownames(data),lengths),
              Sequence=sequence(lengths), 
              apply(data,2,unlist),
              row.names=NULL)

 > It assumes that sapply(data[,k],length) is the
 > same for all k in 1:ncol(data).

Which is, as you inferred correctly from the given example dataset 
(because I forgot to mention explicitly), is always the case.

(2)
 > data.frame(Case=rep(rownames(data),lengths),
             Sequence=sequence(lengths),
             lapply(split(data,colnames(data)[col(data)]), unlist),
             row.names=NULL)

(3)
(David's code with some additions to produce nearly the same output as 
(1) and (2))
(however there's still one difference: columns 2-5 are 'factors')
 > result <- data.frame(do.call(rbind,
         sapply(rownames(data),  function(.x) cbind(.x,
         # those were the rownames
         cbind(1:length(data[.x, "First"][[1]]),
         # and that was the incremental counter
         sapply(data[.x, ],
         # and finally the values which unfortunately get turned into 
characters
         function(.y) return(.y ) ) ) )  )))
 > colnames(result)[1:2] <- c("Case","Sequence")
 > result

Cheers,
Oliver



More information about the R-help mailing list