[R] negative P-values with shapiro.test
Martin Maechler
maechler at stat.math.ethz.ch
Wed Jul 16 18:02:47 CEST 2008
>>>>> "MC" == Mark Cowley <m.cowley at garvan.org.au>
>>>>> on Wed, 16 Jul 2008 15:32:30 +1000 writes:
MC> Dear list,
MC> I am analysing a set of quantitative proteomics data
MC> from 16 patients which has a large numbers of missing
MC> data, thus some proteins are only detected once, upto a
MC> maximum of 16. I want to test each protein for
MC> normality by the Shapiro Wilk test (function
MC> shapiro.test in package stats), which can only be
MC> applied to data with at least 3 measurements, which is
MC> fine. In the case where I have only 3 observations, and
MC> two of those observations are identical, then the
MC> shapiro.test produces negative P-values, which should
MC> never happen. This occurs for all of the situations
MC> that I have tried for 3 values, where 2 are the same.
Yes. Since all such tests are location- and scale-invariant, you
can reproduce it with
shapiro.test(c(0,0,1))
The irony is that the original papers by Roydon and the R help
page all assert that the P-value for n = 3 is exact !
OTOH, the paper [Roydon (1982), Appl.Stat 31, p115-124]
clearly states that
X(1) < X(2) < X(3) ... < X(n)
i.e., does not allow "ties" (two equal values).
If the exact formula in the paper were evaluated exactly
(instead with a rounded value of about 6 digits),
the "exact P-value" would be exactly 0.
Now that would count as a bug in the paper I think.
More about this tomorrow or so.
Martin Maechler, ETH Zurich
MC> Reproducible code below:
MC> # these are the data points that raised the problem
>> shapiro.test(c(-0.644, 0.0566, 0.0566))
MC> Shapiro-Wilk normality test
MC> data: c(-0.644, 0.0566, 0.0566)
MC> W = 0.75, p-value < 2.2e-16
>> shapiro.test(c(-0.644, 0.0566, 0.0566))$p.value
MC> [1] -7.69e-07
MC> # note the verbose output shows a small, but positive P-value, but
MC> when you extract that P using $p.value, it becomes negative
MC> # various other tests
>> shapiro.test(c(1,1,2))$p.value
MC> [1] -8.35e-07
>> shapiro.test(c(-1,-1,2))$p.value
MC> [1] -1.03e-06
MC> cheers,
MC> Mark
>> sessionInfo()
MC> R version 2.6.1 (2007-11-26)
MC> i386-apple-darwin8.10.1
MC> locale:
MC> en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
MC> attached base packages:
MC> [1] tcltk graphics grDevices datasets utils stats
MC> methods base
MC> other attached packages:
MC> [1] qvalue_1.12.0 Cairo_1.3-5 RSvgDevice_0.6.3
MC> SparseM_0.74 pwbc_0.1
MC> [6] mjcdev_0.1 tigrmev_0.1 slfa_0.1
MC> sage_0.1 qtlreaper_0.1
MC> [11] pajek_0.1 mjcstats_0.1 mjcspot_0.1
MC> mjcgraphics_0.1 mjcaffy_0.1
MC> [16] haselst_0.1 geomi_0.1 geo_0.1
MC> genomics_0.1 cor_0.1
MC> [21] bootstrap_0.1 blat_0.1 bitops_1.0-4
MC> mjcbase_0.1 gdata_2.3.1
MC> [26] gtools_2.4.0
MC> -----------------------------------------------------
MC> Mark Cowley, BSc (Bioinformatics)(Hons)
MC> Peter Wills Bioinformatics Centre
MC> Garvan Institute of Medical Research, Sydney, Australia
MC> ______________________________________________
MC> R-help at r-project.org mailing list
MC> https://stat.ethz.ch/mailman/listinfo/r-help
MC> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
MC> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list