[R] Numerical stability in chisq.test
Jan Motl
yzan at volny.cz
Wed Dec 27 12:48:13 CET 2017
The chisq.test contains following code:
STATISTIC <- sum(sort((x - E)^2/E, decreasing = TRUE))
However, based on book Accuracy and stability of numerical algorithms <http://ftp.demec.ufpr.br/CFD/bibliografia/Higham_2002_Accuracy%20and%20Stability%20of%20Numerical%20Algorithms.pdf> Table 4.1 on page 89, it is better to sort the data in increasing order than in decreasing order, when the data are non-negative.
A demonstrative example:
x = matrix(c(rep(1.1, 10000)), 10^16, nrow = 10001, ncol = 1) # We have a vector with 10000*1.1 and 1*10^16
c(sum(sort(x, decreasing = TRUE)), sum(sort(x, decreasing = FALSE)))
The result:
10000000000010996 10000000000011000
When we sort the data in the increasing order, we get the correct result. If we sort the data in the decreasing order, we get a result that is off by 4.
Shouldn't the sort be in the increasing order rather than in the decreasing order?
Best regards,
Jan Motl
PS: This post is based on discussion on https://stackoverflow.com/questions/47847295/why-does-chisq-test-sort-data-in-descending-order-before-summation <https://stackoverflow.com/questions/47847295/why-does-chisq-test-sort-data-in-descending-order-before-summation>.
[[alternative HTML version deleted]]
More information about the R-help
mailing list