[R] How to pass na.rm=T to a user defined function

Jun Shen jun.shen.ut at gmail.com
Sat Jul 30 04:46:39 CEST 2016


Thanks David.This is working perfectly!

On Fri, Jul 29, 2016 at 9:00 PM, David Winsemius <dwinsemius at comcast.net>
wrote:

>
> > On Jul 29, 2016, at 5:52 PM, David Winsemius <dwinsemius at comcast.net>
> wrote:
> >
> >
> >> On Jul 29, 2016, at 5:08 PM, Jun Shen <jun.shen.ut at gmail.com> wrote:
> >>
> >> Thanks Jeff/David for the reply. I wasn't clear in the previous
> message. the problem of using na.omit is it will omit the whole row where
> there is at least one NA, even when some variables do have non-NA values.
> >
> > Did you actually run the example I offered,  or did you just guess at
> what would happen and complained? When applied only to a vector there is no
> such thing as a "column".
> >
> > What you are describing would only have happened if `na.omit` were
> applied to an object that was a dataframe. That was not what was offered in
> the example.
>
> And then I looked at the code again and realized you were not looping over
> the columns as I thought was happening. So what you wnat is:
>
> do.stats <- function(data, stats.func, summary.var)
>          as.data.frame(signif(sapply(stats.func,function(func)
> mapply( func, lapply( data[summary.var], na.omit) )), 3))
>
> --
> David
>
>
> >
> > --
> > David.
> >>
> >> For example: let's define a new function
> >> N <- function(x) length(x[!is.na(x)])
> >>
> >> test <-
> data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100))
> >> test$CL[1] <- NA
> >>
> >> do.stats(test, stats.func=c('mean','sd','median','min','max','N'),
> summary.var=c('CL','V1', 'V2','ALPHA'))
> >>
> >> gives
> >>
> >>         mean    sd  median   min  max  N
> >> CL    -0.0232 0.918 -0.0786 -2.14 3.14 99
> >> V1    -0.0410 0.936 -0.1160 -2.86 2.67 99
> >> V2    -0.1760 0.978 -0.1490 -2.31 2.15 99
> >> ALPHA -0.1380 0.960 -0.2160 -2.41 2.20 99
> >>
> >>
> >> there is one non-missing value in V1,V2 and ALPHA is omitted.
> >>
> >>
> >> On Fri, Jul 29, 2016 at 2:29 AM, David Winsemius <
> dwinsemius at comcast.net> wrote:
> >>
> >>> On Jul 28, 2016, at 7:37 PM, Jun Shen <jun.shen.ut at gmail.com> wrote:
> >>>
> >>> Because in reality the NA may appear in one variable but not others.
> For
> >>> example for ID=1, CL may be NA but not for others, For ID=2, V1 may be
> NA
> >>> etc. To keep all the IDs and all the variables in one data frame, it's
> >>> inevitable to see some NA
> >>
> >> That doesn't seem to acknowledge Newmiller's advice. In particular this
> would have seemed to an obvious response to that suggestion:
> >>
> >> do.stats <- function(data, stats.func, summary.var)
> >>          as.data.frame(signif(sapply(stats.func,function(func)
> >> mapply( func,  na.omit( data[summary.var]) )), 3))
> >>
> >>
> >> And please also heed the advice in the Posting Guide to use plain text.
> >>
> >> --
> >> David.
> >>
> >>
> >>
> >>>
> >>> On Thu, Jul 28, 2016 at 10:22 PM, Jeff Newmiller <
> jdnewmil at dcn.davis.ca.us>
> >>> wrote:
> >>>
> >>>> Why not remove it yourself before passing it to those functions?
> >>>> --
> >>>> Sent from my phone. Please excuse my brevity.
> >>>>
> >>>> On July 28, 2016 5:51:47 PM PDT, Jun Shen <jun.shen.ut at gmail.com>
> wrote:
> >>>>> Dear list,
> >>>>>
> >>>>> I write a small function to calculate multiple stats on multiple
> >>>>> variables
> >>>>> and export in a format exactly the way I want. Everything seems fine
> >>>>> until
> >>>>> NA appears in the data.
> >>>>>
> >>>>> Here is my function:
> >>>>>
> >>>>> do.stats <- function(data, stats.func, summary.var)
> >>>>>          as.data.frame(signif(sapply(stats.func,function(func)
> >>>>> mapply(func,data[summary.var])),3))
> >>>>>
> >>>>> A test dataset:
> >>>>> test <-
> >>>>
> >>>>>
> data.frame(ID=1:100,CL=rnorm(100),V1=rnorm(100),V2=rnorm(100),ALPHA=rnorm(100))
> >>>>>
> >>>>> a command like the following
> >>>>> do.stats(test, stats.func=c('mean','sd','median','min','max'),
> >>>>> summary.var=c('CL','V1', 'V2','ALPHA'))
> >>>>>
> >>>>> gives me
> >>>>>
> >>>>>       mean    sd  median   min  max
> >>>>> CL     0.1030 0.917  0.0363 -2.32 2.47
> >>>>> V1    -0.0545 1.070 -0.2120 -2.21 2.70
> >>>>> V2     0.0600 1.000  0.0621 -2.80 2.62
> >>>>> ALPHA -0.0113 0.919  0.0284 -2.35 2.31
> >>>>>
> >>>>>
> >>>>> However if I have a NA in the data
> >>>>> test$CL[1] <- NA
> >>>>>
> >>>>> The same command run gives me
> >>>>>       mean    sd  median   min  max
> >>>>> CL        * NA    NA      NA    NA   NA*
> >>>>> V1    -0.0545 1.070 -0.2120 -2.21 2.70
> >>>>> V2     0.0600 1.000  0.0621 -2.80 2.62
> >>>>> ALPHA -0.0113 0.919  0.0284 -2.35 2.31
> >>>>>
> >>>>> I know this is because those functions (mean, sd etc.) all have
> >>>>> na.rm=F by default. How can I
> >>>>>
> >>>>> pass na.rm=T to all these functions without manually redefining those
> >>>>> stats functions
> >>>>>
> >>>>> Appreciate any comment.
> >>>>>
> >>>>> Thanks for your help.
> >>>>>
> >>>>>
> >>>>> Jun
> >>>>>
> >>>>>     [[alternative HTML version deleted]]
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
> >>>>> http://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>>
> >>>
> >>>      [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >> David Winsemius
> >> Alameda, CA, USA
> >>
> >>
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list