[R] do.call vs. lapply for lists
Muenchen, Robert A (Bob)
muenchen at utk.edu
Mon Apr 9 19:20:59 CEST 2007
Marc,
That makes the difference between do.call and lapply crystal clear. Your
explanation would make a nice FAQ entry.
Thanks!
Bob
=========================================================
Bob Muenchen (pronounced Min'-chen), Manager
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230
FAX: (865) 974-4810
Email: muenchen at utk.edu
Web: http://oit.utk.edu/scc,
News: http://listserv.utk.edu/archives/statnews.html
=========================================================
> -----Original Message-----
> From: Marc Schwartz [mailto:marc_schwartz at comcast.net]
> Sent: Monday, April 09, 2007 1:06 PM
> To: Muenchen, Robert A (Bob)
> Cc: R-help at stat.math.ethz.ch
> Subject: Re: do.call vs. lapply for lists
>
> On Mon, 2007-04-09 at 12:45 -0400, Muenchen, Robert A (Bob) wrote:
> > Hi All,
> >
> > I'm trying to understand the difference between do.call and lapply
> for
> > applying a function to a list. Below is one of the variations of
> > programs (by Marc Schwartz) discussed here recently to select the
> first
> > and last n observations per group.
> >
> > I've looked in several books, the R FAQ and searched the archives,
> but I
> > can't find enough to figure out why lapply doesn't do what do.call
> does
> > in this case. The help files & newsletter descriptions of do.call
> sound
> > like it would do the same thing, but I'm sure that's due to my lack
> of
> > understanding about their specific terminology. I would appreciate
it
> if
> > you could take a moment to enlighten me.
> >
> > Thanks,
> > Bob
> >
> > mydata <- data.frame(
> > id = c('001','001','001','002','003','003'),
> > math = c(80,75,70,65,65,70),
> > reading = c(65,70,88,NA,90,NA)
> > )
> > mydata
> >
> > mylast <- lapply( split(mydata,mydata$id), tail, n=1)
> > mylast
> > class(mylast) #It's a list, so lapply will so *something* with it.
> >
> > #This gets the desired result:
> > do.call("rbind", mylast)
> >
> > #This doesn't do the same thing, which confuses me:
> > lapply(mylast,rbind)
> >
> > #...and data.frame won't fix it as I've seen it do in other
> > circumstances:
> > data.frame( lapply(mylast,rbind) )
>
> Bob,
>
> A key difference is that do.call() operates (in the above example) as
> if
> the actual call was:
>
> > rbind(mylast[[1]], mylast[[2]], mylast[[3]])
> id math reading
> 3 001 70 88
> 4 002 65 NA
> 6 003 70 NA
>
> In other words, do.call() takes the quoted function and passes the
list
> object as if it was a list of individual arguments. So rbind() is only
> called once.
>
> In this case, rbind() internally handles all of the factor level
> issues,
> etc. to enable a single common data frame to be created from the three
> independent data frames contained in 'mylast':
>
> > str(mylast)
> List of 3
> $ 001:'data.frame': 1 obs. of 3 variables:
> ..$ id : Factor w/ 3 levels "001","002","003": 1
> ..$ math : num 70
> ..$ reading: num 88
> $ 002:'data.frame': 1 obs. of 3 variables:
> ..$ id : Factor w/ 3 levels "001","002","003": 2
> ..$ math : num 65
> ..$ reading: num NA
> $ 003:'data.frame': 1 obs. of 3 variables:
> ..$ id : Factor w/ 3 levels "001","002","003": 3
> ..$ math : num 70
> ..$ reading: num NA
>
>
> On the other hand, lapply() (as above) calls rbind() _separately_ for
> each component of mylast. It therefore acts as if the following
series
> of three separate calls were made:
>
>
> > rbind(mylast[[1]])
> id math reading
> 3 001 70 88
>
> > rbind(mylast[[2]])
> id math reading
> 4 002 65 NA
>
> > rbind(mylast[[3]])
> id math reading
> 6 003 70 NA
>
>
> Of course, the result of lapply() is that the above are combined into
a
> single R list object and returned:
>
> > lapply(mylast, rbind)
> $`001`
> id math reading
> 3 001 70 88
>
> $`002`
> id math reading
> 4 002 65 NA
>
> $`003`
> id math reading
> 6 003 70 NA
>
>
> It is a subtle, but of course critical, difference in how the internal
> function is called and how the arguments are passed.
>
> Does that help?
>
> Regards,
>
> Marc Schwartz
>
More information about the R-help
mailing list