[R] Inconsistency among mean, median, max, var
ggrothendieck@yifan.net
ggrothendieck at yifan.net
Sun Mar 31 00:14:32 CET 2002
Don't get me wrong. I think the R package is great and, in fact, am personally
investing time to learn it. I particularly like its object oriented nature,
data frames (which nicely organize datasets) and the large and increasing set
of packages and interfaces available for it.
I only mention my problems with it in hope it will lead to better more
consistent software. My comments are not a criticism. They are
helpful (hopefully) feedback.
Regarding specifically your query on what is wrong: its too complex and
concepts are not orthogonal. Realistically its necessary to keep going back
to the documentation or test it out to figure out what these functions do
if you don't want to make a mistake.
You need a decision matrix like this one just to figure out what you are going
to get.
----- argument type ------
matrix dataframe
sum single value single value
max single value single value
median single value fails
mean single value columnwise
sd columnwise columnwise
var varcov mat varcov mat
My best try at summarizing this is to split it into two sets of rows
as shown above with the following description:
- mean produces a single value on a matrix and acts columnwise on dataframes
- sd works columwise
- var produces a variance covariance matrix
- others produce a single value except for median which fails on dataframes
It might be an idea to try out more functions just to see how other functions
fit in.
I use another statistical package in which the 12 corresponding functions have
a consistent result (work columnwise).
On 30 Mar 2002 at 20:25, ripley at stats.ox.ac.uk wrote:
> On Sat, 30 Mar 2002 ggrothendieck at yifan.net wrote:
>
> > I found a strange inconsistency:
>
> Well, these do work as documented, and I don't find it even ordinarily
> inconsistent.
>
> > If m is a matrix and d is a data frame then
> >
> > - mean(m), median(m), max(m) and max(d) all return a single value
> >
> > but
> >
> > - mean(d) returns the column means
> > - median(d) fails
> > - both var(m) and var(d) return the variance covariance matrix
> >
> > You pretty much have to experiment to figure this out since much of this
> > behavior is not readily obvious from the help files.
>
> I don't think that is even 1% fair:
>
> ?mean clearly says what it does for a data frame.
> ?median clearly says it only works for numeric vectors.
> ?var clearly says that it works for `a numeric vector, matrix or data
> frame'
>
> Whatever is the problem with that?
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272860 (secr)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list