[R] R summary (and quantiles)
Matthias
The function "fivenum", defines quantiles by assuming that the i-th order
statistic is the
(i-0.5)/(length(x))
quantile. Thus, it defines a 25% quantile by finding the cutoff point where
25% are below and 75% above. In this example, this is the "center" of
27.08, counting half of this measurement as "above" and half as "below".
This makes a lot of sense, but problem with this definition is that the min
is not the 0% quantile, but the 1/2n-quantile.
> (order(x)-.5)/(length(x))
[1] 0.02272727 0.06818182 0.11363636 0.15909091 0.20454545 0.25000000
[7] 0.29545455 0.34090909 0.38636364 0.43181818 0.47727273 0.52272727
[13] 0.56818182 0.61363636 0.65909091 0.70454545 0.75000000 0.79545455
[19] 0.84090909 0.88636364 0.93181818 0.97727273
The function "summary" is based on a definition of quantiles that is biased
to equate the min to the 0%-quantile and max to the 100%-quantile. "The
algorithm linearly interpolates between order statistics of x, assuming
that the ith order statistic is the
(i-1)/(length(x)-1)
quantile."
The solution is simple: Never use
quantile or
summary
if you are interested in quantiles ;-)
>i use R only a few days and don't understand the difference between
>fivenum(x) und summary(x).
>
> > x
> [1] 20.77 22.56 22.71 22.99 26.39 27.08 27.32 27.33 27.57 27.81 28.69 29.36
>[13] 30.25 31.89 32.88 33.23 33.28 33.40 33.52 33.83 33.95 34.82
> > fivenum(x)
>[1] 20.770 27.080 29.025 33.280 34.820
> > summary(x)
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> 20.77 27.14 29.03 29.17 33.27 34.82
>
>And why is the 1st Qu. 27.14 although
>qL=x(1/4*(n+1))=x(23/4)=x(5 3/4)
>x(5)=26.39
>x(6)=27.08
>why is ql in summary between x(6) und x(7)??
>
>I have learned that 1st Qu. = q(0.25)... so i am a little confused.
>
