[R] significance test interquartile ranges

Sat Jul 14 19:58:35 CEST 2012

Dear Peter,

thanks for your clarifications. Sample size is around 200 in each group. Would that justify your approach?

I found a couple of more tests for scale on continous variables, ie. 
Mood Test
Ansari-Bradley Test (that one is also implemented in R)
Klotz Test
Conover Test

Would one of those be suitable to test for different dispersion (e.g. IQR or the like) in non-normal distributions?

thanks,

joerg

________________________________________
Von: peter dalgaard [pdalgd at gmail.com]
Gesendet: Samstag, 14. Juli 2012 10:01
Bis: Prof Brian Ripley
Cc: Greg Snow; R-help; Schaber, Jörg
Betreff: Re: [R] significance test interquartile ranges

On Jul 14, 2012, at 08:16 , Prof Brian Ripley wrote:

> On 13/07/2012 21:37, Greg Snow wrote:
>> A permutation test may be appropriate:
>
> Yes, it may, but precisely which one is unclear.  You are testing whether the two samples have an identical distribution, whereas I took the question to be a test of differences in dispersion, with differences in location allowed.
>
> I do not think this can be solved without further assumptions.  E.g people often replace the two-sample t-test by the two-sample Wilcoxon test as a test of differences in location, not realizing that the latter is also sensitive to other aspects of the difference (e.g. both dispersion and shape).

(Brian knows this, of course, but I though it useful to insert a little quibbling.)

"Sensitive" is perhaps a little misleading here. The test statistic in the Wilcoxon test is essentially an estimate of the probability that a random observation in one group is bigger than a random observation in the other group. It isn't hard to imagine situation where that quantity is unaffected by a dispersion change so the test is not sensitive in the sense that it can detect dispersion changes between sufficiently large samples.

However, the point is that p values _rely on_ the null hypothesis that two distributions are exactly the same. This is mostly uncontroversial if you are testing for an irrelevant grouping, but if you need confidence intervals for the difference, you are implicitly assuming a location-shift model.

The same thing is true for permutation tests in general: You need to be rather careful about what the assumptions are that allows you to interchange things. Asymptotically, the distribution of the IQR depends on the values of the density at the true quartiles. These could be different in the two groups, and easily completely unrelated to those of a  pooled sample.

I think that I would suggest finding an error estimate for the IQR (or maybe log IQR) in each group separately, perhaps by bootstrapping, and then compare between groups with an asymptotic z test. The main caveat is whether you have sufficiently large sample sizes for asymptotics to hold.

Peter D.

>
> I nearly suggested (yesterday) doing the permutation test on differences from medians in the two groups.  But really this is off-topic for R-help and needs interaction with a knowledgeable statistician to refine the question.
>
>> 1. compute the ratio of the 2 IQR values (or other comparison of interest)
>> 2. combine the data from the 2 samples into 1 pool, then randomly
>> split into 2 groups (matching sample sizes of original) and compute
>> the ratio of the IQR values for the 2 new samples.
>> 3. repeat #2 a bunch of times (like for a total of 999 random splits)
>> and combine with the original value.
>> 4. (optional, but strongly suggested) plot a histogram of all the
>> ratios and place a reference line of the original ratio on the plot.
>> 5. calculate the proportion of ratios that are as extreme or more
>> extreme than the original, this is the (approximate) p-value.
>
> I think it is an 'exact' (but random) p-value.
>
>>
>> On Fri, Jul 13, 2012 at 5:32 AM, Schaber, Jörg
>> <joerg.schaber at med.ovgu.de> wrote:
>>> Hi,
>>>
>>> I have two non-normal distributions and use interquartile ranges as a dispersion measure.
>>> Now I am looking for a test, which tests whether the interquartile ranges from the two distributions are significantly different.
>>> Any idea?
>>>
>>> Thanks,
>>>
>>> joerg
>>>
>
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com