[R] Odd results with Chi-square test. (Not an R problem, but general statistics, I think)

Polwart Calum (County Durham and Darlington NHS Foundation Trust) calum.polwart at nhs.net
Tue Aug 18 19:17:20 CEST 2009

I'm far from an expert on stats but what I think you are saying is if you try and compare Baseline with Version 3 you don't think your p-value is as good as version 1 and 2.  I'm not 100% sure you are meant to do that with p-values but I'll let someone else comment on that!.

                total    incorrect  correct   % correct
baseline     898      708         190       21.2%
version_1   898      688         210       23.4%
version_2   898      680         218      24.3%
version_3   1021    790          231      22.6%

> Here, the p value for version_3 (when compared with the baseline) seems to
> make no sense whatsoever. It shouldn't be larger that the other two p
> values, the increase in correct answers (that is what counts!) is bigger
> after all.
No its not the raw numbers its the proportion of correct answers that counts.

I've added a % correct to your data - does that  make it clearer?  Only 22.6% of version 3's answers were correct - so the difference in terms of % change is smaller than version 1 and 2 produced.  From my niave persepctive I'd want to test for a difference between all results and baseline, and then v1 & v2, v1 & v3, v2 & v3  (you may tell me they are unsound things to test - in which case don't test them.  You'd then need to determine a threshold for accepting that the test is valid (say p < 0.05).  I'#d contest that the test should be two tailed - results could be better or worse?

You should also develop a hypothesis.  Let me create one for you:

H1: version1 of the software is better than baseline
(H0: version 1 is no better than baseline)

H1: version2 of the software is better than version 1
(H0: version 2 is no better than version 1)

H1: version3 of the software is better than version 2
(H0: version 3 is no better than version 2)

Now look at you results and p-values and and work out if the H1 or H0 applies. You could develop further variants (D: version 3 is better than baseline).

Finally - remember to consider the 'clinical significance' as well as the statistical significance.  I'd have hoped a software change might have increase correct answers to say 40%?  And remember also that p-value of 0.05 has a false positive rate of 1:20.

> Any idea what's going on here? I thought the sample size should have no
> impact on the results?
Erm.. sample size always has an influence of results,  If you show a  difference in 100 samples - you would expect a larger p value for virtually any statistical test you chose than if you show that same difference in 1000 results.  You have a bigger sample but a smaller overall difference so in effect you can be less sure that that change is not down to chance. (Purists statisticians will likely challenge that definition)


This message may contain confidential information. If yo...{{dropped:21}}

More information about the R-help mailing list