[R] How to quickly convert a data.frame into a structure of lists
Duncan Murdoch
murdoch.duncan at gmail.com
Wed Aug 10 18:42:57 CEST 2011
On 10/08/2011 10:30 AM, Frederic F wrote:
> Hello Duncan,
>
> Here is a small example to illustrate what I am trying to do.
>
> # Example data.frame
> df=data.frame(A=c("a","a","b","b"), B=c("X","X","Y","Z"), C=c(1,2,3,4))
> # A B C
> # 1 a X 1
> # 2 a X 2
> # 3 b Y 3
> # 4 b Z 4
>
> ### First way of getting the list structure (ls1) using imbricated lapply
> loops:
> # Get the structure and populate it:
> ls1<-lapply(levels(df$A), function(levelA) {
> lapply(levels(df$B), function(levelB) {df$C[df$A==levelA&
> df$B==levelB]})
> })
> # Apply the names:
> names(list_structure)<-levels(df$A)
> for (i in 1:length(list_structure))
> {names(list_structure[[i]])<-levels(df$B)}
>
> # Result:
> ls1$a$X
> # [1] 1 2
> ls1$b$Z
> # [1] 4
>
> The data.frame will always be 'complete', i.e., there will be a value in
> every row for every column.
> I want to produce a structure like this one quickly (I aim at something
> below 10 seconds) for a dataset containing between 1 and 2 millions of rows.
>
I don't know what the timing would be like for your real data, but this
does look like by() would work:
ls1 <- by(df$C, df[,1:2], identity)
When I repeat the rows of df a million times each, this finishes in a
few seconds. It would definitely be slower if there were more levels of
A or B.
Now ls1 will be a matrix whose entries are the subsets of C that you
want, so you can see your two results with slightly different syntax:
> ls1[["a", "X"]]
[1] 1 2
> ls1[["b","Z"]]
[1] 4
Duncan Murdoch
More information about the R-help
mailing list