[R] Reshaping matrix of lists as dataframe
Oliver Gondring
olihui at gmx.de
Mon Feb 1 08:58:41 CET 2010
Hello William, hello David,
thanks a lot for helping and keeping me going on what sometimes seems
to be a long way to R mastery! :)
I found that the two solutions William proposed were in fact easier to
understand for me at the moment as David's (and has the additional
advantage of producing the desired data types ('numeric'/'integer') in
the columns 2-5), however I think all of the code you provided will be
extremely helpful to learn some new tricks by analyzing it in detail.
For everyone concerned with similar data manipulation tasks, here's a
short summary of the thread:
>>> The original data (a matrix of _lists_, of cours - mea culpa -
hence the modified name of the thread):
x <- list(c(1,2,4),c(1,3,5),c(0,1,0),
c(1,3,6,5),c(3,4,4,4),c(0,1,0,1),
c(3,7),c(1,2),c(0,1))
data <- matrix(x,byrow=TRUE,nrow=3)
colnames(data) <- c("First", "Length", "Value")
rownames(data) <- c("Case1", "Case2", "Case3")
> data
First Length Value
Case1 Numeric,3 Numeric,3 Numeric,3
Case2 Numeric,4 Numeric,4 Numeric,4
Case3 Numeric,2 Numeric,2 Numeric,2
>>> The desired output (a dataframe of a database-like 'flat' structure):
> Case Sequence First Length Value
> 1 Case1 1 1 1 0
> 2 Case1 2 2 3 1
> 3 Case1 3 4 5 0
> 4 Case2 1 1 3 0
> 5 Case2 2 3 4 1
> 6 Case2 3 6 4 0
> 7 Case2 4 5 4 1
> 8 Case3 1 3 1 0
> 9 Case3 2 7 2 1
>>> Ways to do it:
(1)
> lengths<-sapply(data[,1],length)
> data.frame(Case=rep(rownames(data),lengths),
Sequence=sequence(lengths),
apply(data,2,unlist),
row.names=NULL)
> It assumes that sapply(data[,k],length) is the
> same for all k in 1:ncol(data).
Which is, as you inferred correctly from the given example dataset
(because I forgot to mention explicitly), is always the case.
(2)
> data.frame(Case=rep(rownames(data),lengths),
Sequence=sequence(lengths),
lapply(split(data,colnames(data)[col(data)]), unlist),
row.names=NULL)
(3)
(David's code with some additions to produce nearly the same output as
(1) and (2))
(however there's still one difference: columns 2-5 are 'factors')
> result <- data.frame(do.call(rbind,
sapply(rownames(data), function(.x) cbind(.x,
# those were the rownames
cbind(1:length(data[.x, "First"][[1]]),
# and that was the incremental counter
sapply(data[.x, ],
# and finally the values which unfortunately get turned into
characters
function(.y) return(.y ) ) ) ) )))
> colnames(result)[1:2] <- c("Case","Sequence")
> result
Cheers,
Oliver
More information about the R-help
mailing list