[R] apply min function rowwise

William Dunlap wdunlap at tibco.com
Sat Jun 5 22:22:32 CEST 2010


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Joshua Wiley
> Sent: Saturday, June 05, 2010 11:12 AM
> To: moleps
> Cc: r-help at r-project.org
> Subject: Re: [R] apply min function rowwise
> 
> On Sat, Jun 5, 2010 at 10:22 AM, moleps <moleps2 at gmail.com> wrote:
> > thx.
> >
> > It was only the first instance that was class date. The 
> rest were factors. So that explains it.
> >
> > If I want to change the rest in vec into class date (there 
> are many of them...)
> >
> > neither  as.Date(canc[,vec],"%d.%m.%Y") or 
> sapply(canc[,vec],FUN=function(x) as.date(x,"%d.%m.%Y"))
> >
> > What is the easy solution to this?
> 
> This is the nicest solution that comes to mind:
> 
> as.data.frame(lapply(X=samp.dat, FUN=as.Date, format="%d.%m.%Y"))
> 
> I believe the problem is that sapply() coerces the results (by default
> when simplify=TRUE) using as.vector() leaving you with the number of
> days since the origin.  Anyway, using as.data.frame() on the list
> output from lapply() seems to work.
> 
> Josh

Note that once you have changed your input data.frame
to consist entirely of Date columns, the expression
   apply(d, 1, min)
will still not give you a Date as output, but will output
character data instead.  apply() just doesn't work well
with data.frames unless all columns are numeric or you
are satisfied with all columns being coerced to character
data.  apply(X,MARGIN,FUN) calls as.matrix() to convert X
to a matrix, often a character matrix, before doing
anything to its rows or columns and tries to convert the
outputs of FUN(X[...]) to a vector or a matrix on the way out,
often losing track of the class of FUN's output.

To get the row minima for a data frame try pmin ('parallel
minima'):
   do.call(pmin, d)
or, to protect against funny column names like 'na.rm'
   do.call(unname(as.list(d)))
E.g., with the following data.frame of Date columns
   > d <- data.frame(lapply(list(
         One=c("01.02.2010","02.01.2009","03.03.2008"),
         Two=c("03.01.2009","21.12.1937","25.11.2001"),
         Three=c("21.06.1995","10.02.2008","24.12.2010")),
       as.Date, format="%d.%m.%Y"))
   > d
            One        Two      Three
   1 2010-02-01 2009-01-03 1995-06-21
   2 2009-01-02 1937-12-21 2008-02-10
   3 2008-03-03 2001-11-25 2010-12-24
   > str(d)
   'data.frame':   3 obs. of  3 variables:
    $ One  :Class 'Date'  num [1:3] 14641 14246 13941
    $ Two  :Class 'Date'  num [1:3] 14247 -11699 11651
    $ Three:Class 'Date'  num [1:3] 9302 13919 14967
compare the following
   > apply(d, 1, min) # looks ok
   [1] "1995-06-21" "1937-12-21" "2001-11-25"
   > str(.Last.value) # but it produced character data
    chr [1:3] "1995-06-21" "1937-12-21" "2001-11-25"
   > do.call(pmin, unname(as.list(d))) # looks ok also
   [1] "1995-06-21" "1937-12-21" "2001-11-25"
   > str(.Last.value) # and it produced Date data
   Class 'Date'  num [1:3] 9302 -11699 11651

For a data.frame with lots of rows pmin will generally
be faster than apply(X,1,min), in addition to giving
the correct answer.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> >
> > Regards,
> >
> > //M
> >
> >
> >
> > On 5. juni 2010, at 18.30, Joshua Wiley wrote:
> >
> >> Hello M,
> >>
> >> My guess is that it has something to do with the class of the
> >> variables.  Perhaps you could provide a small sample 
> dataframe?  Also
> >> you might try running str() on your data frame and seeing if the
> >> results are what you would expect.  As a side note, it is not
> >> necessary to make an anonymous function here, as you are allowed to
> >> pass arguments to the function applied.
> >>
> >> apply(canc[,vec],1, min, na.rm=TRUE)
> >>
> >> Best regards,
> >>
> >> Josh
> >>
> >> On Sat, Jun 5, 2010 at 8:30 AM, moleps <moleps2 at gmail.com> wrote:
> >>> I´m trying to tease out the minimum value from a row in a 
> dataframe where all the variables are dates.
> >>>
> >>> apply(canc[,vec],1,function(x)min(x,na.rm=T))
> >>>
> >>>
> >>> However it only returns empty strings for the entire 
> dataframe except for one date value (which is not the minimum date).
> >>>
> >>> I´ve also tried
> >>>
> >>> apply(canc[,vec],1,function(x)max(x,na.rm=T))
> >>>
> >>> which provides values rowwise, but many of them are not 
> in fact the largest in the row.
> >>>
> >>>
> >>> Any advice?
> >>>
> >>>
> >>> Regards,
> >>>
> >>> //M
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >>
> >>
> >> --
> >> Joshua Wiley
> >> Senior in Psychology
> >> University of California, Riverside
> >> http://www.joshuawiley.com/
> >
> >
> 
> 
> 
> -- 
> Joshua Wiley
> Senior in Psychology
> University of California, Riverside
> http://www.joshuawiley.com/
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list