[R] best way to apply a list of functions to a dataset ?

Wed Jul 21 03:10:03 CEST 2010

On Jul 20, 2010, at 8:37 PM, Glen Barnett wrote:

> Hi Dennis,
>
> Thanks for the reply.
>
> Yes, that's easier, but the conversion to a matrix with rbind has
> converted the output of that final function to a numeric.
>
> I included that last function in the example secifically to preclude
> people assuming that functions would always return the same type.
>
> I guess this doesn't matter too much for a logical, but what if
> instead the function returned a character (say "mean", "median", or
> "equal" - indicating which one was larger, or "equal" which could
> easily happen with discrete data). This precludes using rbind (which I
> also used at first, before I noticed that sometimes I could have
> functions that don't return numerics).

Have you considered the summaryBy function in package doBy? (Why  
reinvent the wheel?)

-- 
David.
>
> Glen
>
>
> On Tue, Jul 20, 2010 at 6:55 PM, Dennis Murphy <djmuser at gmail.com>  
> wrote:
>> Hi:
>>
>> This might be a little easier (?):
>>
>> library(datasets)
>> skewness <- function(x) mean(scale(x)^3)
>> mean.gt.med <- function(x) mean(x)>median(x)
>>
>> # ------
>> # construct the function to apply to each variable in the data frame
>> f <- function(x) c(mean = mean(x), sd = sd(x), skewness =  
>> skewness(x),
>>      median = median(x), mean.gt.med = mean.gt.med(x))
>>
>> # map function to each variable with lapply and combine with  
>> do.call():
>> do.call(rbind, lapply(attitude, f))
>>                mean        sd    skewness median mean.gt.med
>> rating     64.63333 12.172562 -0.35792491   65.5           0
>> complaints 66.60000 13.314757 -0.21541749   65.0           1
>> privileges 53.13333 12.235430  0.37912287   51.5           1
>> learning   56.36667 11.737013 -0.05403354   56.5           0
>> raises     64.63333 10.397226  0.19754317   63.5           1
>> critical   74.76667  9.894908 -0.86577893   77.5           0
>> advance    42.93333 10.288706  0.85039799   41.0           1
>>
>> HTH,
>> Dennis
>>
>>
>> On Mon, Jul 19, 2010 at 10:51 PM, Glen Barnett <glnbrntt at gmail.com>  
>> wrote:
>>>
>>> Assuming I have a matrix of data (or under some restrictions that  
>>> will
>>> become obvious, possibly a data frame), I want to be able to apply a
>>> list of functions (initially producing a single number from a  
>>> vector)
>>> to the data and produce a data frame (for compact output) with  
>>> column
>>> 1 being the function results for the first function, column 2 being
>>> the results for the second function and so on - with each row being
>>> the columns of the original data.
>>>
>>> The obvious application of this is to produce summaries of data sets
>>> (a bit like summary() does on numeric matrices), but with user
>>> supplied functions. I am content for the moment to leave it to the
>>> user to supply functions that work with the data they supply so as  
>>> to
>>> produce results that will actually be data-frame-able, though I'd  
>>> like
>>> to ultimately make it a bit nicer than it currently is without
>>> compromising the niceness of the output in the "good" cases.
>>>
>>> The example below is a simplistic approach to this problem (it  
>>> should
>>> run as is). I have named it "fapply" for fairly obvious reasons, but
>>> added the ".1" because it doesn't accept multidimensional arrays. I
>>> have included the output I generated, which is what I want. There  
>>> are
>>> some obvious generalizations (e.g. being able to include functions
>>> like range(), say, that produce several values on a vector, rather
>>> than one, making the user's life simpler when a function already  
>>> does
>>> most of what they need).
>>>
>>> The question is: this looks like a silly approach, growing a list
>>> inside a for loop. Also I recall reading that if you find yourself
>>> using "do.call" you should probably be doing something else.
>>>
>>> So my question: Is there a better way to implement a function like  
>>> this?
>>>
>>> Or, even better, is there already a function that does this?
>>>
>>> ## example function and code to apply a list of functions to a  
>>> matrix
>>> (here a numeric data frame)
>>>
>>> library(datasets)
>>>
>>> fapply.1 <- function(x, fun.l, colnames=fun.l){
>>> out.l <- list()   # starts with an empty list
>>> for (i in seq_along(fun.l)) out.l[[i]] <- apply(x,2,fun.l[[i]])   #
>>> loop through list of functions
>>>
>>> # set up names and make into a data frame
>>> names(out.l) <- colnames
>>> attr(out.l,"row.names") <- names(out.l[[1]])
>>> attr(out.l,"class") <- "data.frame"
>>> out.l
>>> }
>>>
>>> skewness <- function(x) mean(scale(x)^3)      #define a simple  
>>> numeric
>>> function
>>> mean.gt.med <- function(x) mean(x)>median(x)  # define a simple
>>> non-numeric fn
>>> flist <- c("mean","sd","skewness","median","mean.gt.med") # make  
>>> list
>>> of fns to apply
>>>
>>> fapply.1(attitude,flist)
>>>               mean        sd    skewness median mean.gt.med
>>> rating     64.63333 12.172562 -0.35792491   65.5       FALSE
>>> complaints 66.60000 13.314757 -0.21541749   65.0        TRUE
>>> privileges 53.13333 12.235430  0.37912287   51.5        TRUE
>>> learning   56.36667 11.737013 -0.05403354   56.5       FALSE
>>> raises     64.63333 10.397226  0.19754317   63.5        TRUE
>>> critical   74.76667  9.894908 -0.86577893   77.5       FALSE
>>> advance    42.93333 10.288706  0.85039799   41.0        TRUE
>>>
>>> ## end code and output
>>>
>>> So did I miss something obvious?
>>>
>>> Any suggestions as far as style or simple stability-enhancing
>>> improvements would be handy.
>>>
>>> regards,
>>> Glen
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT