[R] chisq.test vs manual calculation - why are different results produced?
David Winsemius
dwinsemius at comcast.net
Mon Feb 20 15:24:07 CET 2012
On Feb 20, 2012, at 5:57 AM, Louise Mair wrote:
> Hello,
>
> I am trying to fit gamma, negative exponential and inverse power
> functions
> to a dataset, and then test whether the fit of each curve is good.
> To do
> this I have been advised to calculate predicted values for bins of
> data (I
> have grouped a continuous range of distances into 1km bins), and
> then apply
> a chi-squared test. Example:
>
>> data <- data.frame(distance=c(1,2,3,4,5,6,7),
>> observed=c(43,13,10,6,2,1),
> predicted=c(28, 18, 10, 5 ,3, 1, 1))
There's an error with that code.
>
>> chisq.test(data$observed, data$predicted)
>
> Which gives:
>
> Pearson's Chi-squared test
>
> data: data$observed and data$predicted
> X-squared = 35, df = 25, p-value = 0.0882
>
> Warning message:
> In chisq.test(data$observed, data$predicted) :
> Chi-squared approximation may be incorrect
>
> I understand this is due to having observed/predicted values of less
> than
> five, however I am interested to know firstly why R uses such a large
> number of degrees of freedom (when by my understanding there should
> only be
> 4 df), and secondly whether using the following manual calculation is
> therefore inappropriate -
Read the help page Details section .... end of second paragraph.
You probably wanted:
chisq.test(cbind(data$observed, data$predicted))
>
>> X2 <- sum(((data$observed - data$predicted)^2)/data$predicted)
>> 1-pchisq(X2,4)
> [1] 0.04114223
>
> If chi-squared is unsuitable, what other test can I use to determine
> whether my observed and predicted data come from the same
> distribution? The
> frequently recommended fisher's test doesn't seem to be any more
> appropriate as it requires values of greater than 5 for contingency
> tables
> larger than 2 x 2.
>
> Thanks for your help.
>
> Louise
>
> [[alternative HTML version deleted]]
Plain text is requested as the mail format.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list