[R] do.call vs. lapply for lists
Marc Schwartz
marc_schwartz at comcast.net
Mon Apr 9 19:05:52 CEST 2007
On Mon, 2007-04-09 at 12:45 -0400, Muenchen, Robert A (Bob) wrote:
> Hi All,
>
> I'm trying to understand the difference between do.call and lapply for
> applying a function to a list. Below is one of the variations of
> programs (by Marc Schwartz) discussed here recently to select the first
> and last n observations per group.
>
> I've looked in several books, the R FAQ and searched the archives, but I
> can't find enough to figure out why lapply doesn't do what do.call does
> in this case. The help files & newsletter descriptions of do.call sound
> like it would do the same thing, but I'm sure that's due to my lack of
> understanding about their specific terminology. I would appreciate it if
> you could take a moment to enlighten me.
>
> Thanks,
> Bob
>
> mydata <- data.frame(
> id = c('001','001','001','002','003','003'),
> math = c(80,75,70,65,65,70),
> reading = c(65,70,88,NA,90,NA)
> )
> mydata
>
> mylast <- lapply( split(mydata,mydata$id), tail, n=1)
> mylast
> class(mylast) #It's a list, so lapply will so *something* with it.
>
> #This gets the desired result:
> do.call("rbind", mylast)
>
> #This doesn't do the same thing, which confuses me:
> lapply(mylast,rbind)
>
> #...and data.frame won't fix it as I've seen it do in other
> circumstances:
> data.frame( lapply(mylast,rbind) )
Bob,
A key difference is that do.call() operates (in the above example) as if
the actual call was:
> rbind(mylast[[1]], mylast[[2]], mylast[[3]])
id math reading
3 001 70 88
4 002 65 NA
6 003 70 NA
In other words, do.call() takes the quoted function and passes the list
object as if it was a list of individual arguments. So rbind() is only
called once.
In this case, rbind() internally handles all of the factor level issues,
etc. to enable a single common data frame to be created from the three
independent data frames contained in 'mylast':
> str(mylast)
List of 3
$ 001:'data.frame': 1 obs. of 3 variables:
..$ id : Factor w/ 3 levels "001","002","003": 1
..$ math : num 70
..$ reading: num 88
$ 002:'data.frame': 1 obs. of 3 variables:
..$ id : Factor w/ 3 levels "001","002","003": 2
..$ math : num 65
..$ reading: num NA
$ 003:'data.frame': 1 obs. of 3 variables:
..$ id : Factor w/ 3 levels "001","002","003": 3
..$ math : num 70
..$ reading: num NA
On the other hand, lapply() (as above) calls rbind() _separately_ for
each component of mylast. It therefore acts as if the following series
of three separate calls were made:
> rbind(mylast[[1]])
id math reading
3 001 70 88
> rbind(mylast[[2]])
id math reading
4 002 65 NA
> rbind(mylast[[3]])
id math reading
6 003 70 NA
Of course, the result of lapply() is that the above are combined into a
single R list object and returned:
> lapply(mylast, rbind)
$`001`
id math reading
3 001 70 88
$`002`
id math reading
4 002 65 NA
$`003`
id math reading
6 003 70 NA
It is a subtle, but of course critical, difference in how the internal
function is called and how the arguments are passed.
Does that help?
Regards,
Marc Schwartz
More information about the R-help
mailing list