[R] ks.test - continuous vs discrete

David Middleton dmiddleton at fisheries.gov.fk
Thu Mar 28 11:48:04 CET 2002


Thanks for the input, and sorry for the delay in returning to the thread.

> > I frequently want to test for differences between animal size frequency
> > distributions.  The obvious test (I think) to use is the
Kolmogorov-Smirnov
> > two sample test (provided in R as the function ks.test in package
ctest).
> 
> "obvious" depends on the problem you want to test: KS tests the hypothesis
> 
> H_0: F(z) = G(z) for all z vs. H_1: F(z) != G(z) for at least one z 
> 
> ks.test assumes that both F and G are continuous variables. However, if
> you want to test
> 
> H_0: F(z) = G(z)  vs. H_1: F(z) = G(z - delta); delta != 0
> 
> as "test for differences" indicates, the Wilcoxon rank sum test is
> "obvious". Or, more general, if your hypothesis is "exchangeability", a
> permutation test can be used.

Apologies for my vague description.  The Wilcoxon rank sum test is a test of
difference in location, as is the permutation test I believe.  I am
interested in more than just location - the animal size distributions I have
in mind are often multimodal, encompassing different cohorts for example -
so I am interested in a more general test of differences in the
distributions, both for exploratory purposes and too see if it is
appropriate to lump samples.  Thus the KS test seems the "obvious" choice.

> > The KS test is for continuous variables and this obviously includes
length,
> > weight etc.  However, limitations in measuring (e.g length to the
nearest
> > cm/mm, weight to the nearest g/mg etc) has the obvious effect of
> > "discretising" real data.
> 
> or maybe the underlying distribution is discrete? 

In the case I described (animal size) it is pretty clear that the variable
is continuous, and likewise the underlying distribution.  The ties really
are the result of rounding error.

Off list both Don MacQueen and Ross Darnell came up with the idea of
"jittering" the values (adding a random number form a uniform distribution
half the width of the measurement unit) to remove the ties, and re-testing
to see if the rounding was influencing the results.  This seems to be what I
need.

David Middleton


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list