[R] help comparing two median with R
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Tue Apr 17 17:04:41 CEST 2007
Thomas Lumley wrote:
> On Tue, 17 Apr 2007, Robert McFadden wrote:
>
>>> -----Original Message-----
>>> From: r-help-bounces at stat.math.ethz.ch
>>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jim Lemon
>>> Sent: Tuesday, April 17, 2007 12:37 PM
>>> To: Pedro A Reche
>>> Cc: r-help at stat.math.ethz.ch
>>> Subject: Re: [R] help comparing two median with R
>>>
>>> Pedro A Reche wrote:
>>>> Dear R users,
>>>> I am new to R and I would like to ask your help with the following
>>>> topic. I have three sets of numeral data, 2 sets are paired and a
>>>> third is independent of the other two. For each of these sets I have
>>>> obtained their basic statistics (mean, median, stdv, range ...).
>>>> Now I want to compare if these sets differ. I could compare
>>> the mean
>>>> doing a basic T test . However, I was looking for a test to compare
>>>> the medians using R. If that is possible I would love to
>>> hear the
>>>> specifics.
>>> Hi Pedro,
>>> You can use the Mann-Whitney test ("wilcox" with two
>>> samples), but you would have to check that the second and
>>> third moments of the variable distributions were the same, I think.
>>>
>>> Jim
>> Use Mann-Whitney U test, but remember about 2 assumption:
>> 1. samples come from continuous distribution (there are no tied
>> obserwations)
>> 2. distributions are identical in shape. It's very similar to t-test but
>> Mann-Whitney U test is not as affected by violation of the homogeneity of
>> variance assumption as t-test is.
>>
>
> This turns out not to be quite correct.
>
> If the two distributions differ only by a location shift then the
> hypothesis that the shift is zero is equivalent to the medians being the
> same (or the means, or the 3.14159th percentile), and the Mann-Whitney U
> test will test this hypothesis. Otherwise the Mann-Whitney U test does not
> test for equal medians.
>
> The assumption that the distributions are continuous is for convenience --
> it makes the distribution of the test statistic easier to calculate and
> otherwise R uses a approximation. The assumption of a location shift is
> critical -- otherwise it is easy to construct three data sets x,y,z so
> that the Mann-Whitney U test thinks x is larger than y, y is larger than z
> and z is larger than x (Google for Efron Dice). That is, the Mann-Whitney
> U test cannot be a test for any location statistic.
>
> There actually is an exact test for the median that does not assume a
> location shift: dichotomize your data at the pooled median to get a 2x2
> table of above/below median by group, and do Fisher's exact test on the
> table. This is almost never useful (because it doesn't come with an
> interval estimate), but is interesting because it (and the generalizations
> to other quantiles) is the only exactly distribution-free location test
> that does not have the 'non-transitivity' problem of the Mann-Whitney U
> test. I believe this median test is attributed to Mood, but I have not
> seen the primary source.
>
> -thomas
The Mood test is so inefficient that its use is no longer recommended:
@Article{fri00sho,
author = {Freidlin, Boris and Gastwirth, Joseph L.},
title = {Should the median test be retired from
general use?},
journal = American Statistician,
year = 2000,
volume = 54,
number = 3,
pages = {161-164},
annote = {inefficiency of Mood median test}
}
The points that Thomas and Brian have made are certainly correct, if one
is truly interested in testing for differences in medians or means. But
the Wilcoxon test provides a valid test of x > y more generally. The
test is consonant with the Hodges-Lehmann estimator: the median of all
possible differences between an X and a Y.
Frank
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list