[R] Odd results with Chi-square test. (Not an R problem, but general statistics, I think)
Polwart Calum (County Durham and Darlington NHS Foundation Trust)
calum.polwart at nhs.net
Tue Aug 18 19:17:20 CEST 2009
I'm far from an expert on stats but what I think you are saying is if you try and compare Baseline with Version 3 you don't think your p-value is as good as version 1 and 2. I'm not 100% sure you are meant to do that with p-values but I'll let someone else comment on that!.
total incorrect correct % correct
baseline 898 708 190 21.2%
version_1 898 688 210 23.4%
version_2 898 680 218 24.3%
version_3 1021 790 231 22.6%
>
> Here, the p value for version_3 (when compared with the baseline) seems to
> make no sense whatsoever. It shouldn't be larger that the other two p
> values, the increase in correct answers (that is what counts!) is bigger
> after all.
>
No its not the raw numbers its the proportion of correct answers that counts.
I've added a % correct to your data - does that make it clearer? Only 22.6% of version 3's answers were correct - so the difference in terms of % change is smaller than version 1 and 2 produced. From my niave persepctive I'd want to test for a difference between all results and baseline, and then v1 & v2, v1 & v3, v2 & v3 (you may tell me they are unsound things to test - in which case don't test them. You'd then need to determine a threshold for accepting that the test is valid (say p < 0.05). I'#d contest that the test should be two tailed - results could be better or worse?
You should also develop a hypothesis. Let me create one for you:
A.
H1: version1 of the software is better than baseline
(H0: version 1 is no better than baseline)
B.
H1: version2 of the software is better than version 1
(H0: version 2 is no better than version 1)
C.
H1: version3 of the software is better than version 2
(H0: version 3 is no better than version 2)
Now look at you results and p-values and and work out if the H1 or H0 applies. You could develop further variants (D: version 3 is better than baseline).
Finally - remember to consider the 'clinical significance' as well as the statistical significance. I'd have hoped a software change might have increase correct answers to say 40%? And remember also that p-value of 0.05 has a false positive rate of 1:20.
>
> Any idea what's going on here? I thought the sample size should have no
> impact on the results?
>
Erm.. sample size always has an influence of results, If you show a difference in 100 samples - you would expect a larger p value for virtually any statistical test you chose than if you show that same difference in 1000 results. You have a bigger sample but a smaller overall difference so in effect you can be less sure that that change is not down to chance. (Purists statisticians will likely challenge that definition)
********************************************************************************************************************
This message may contain confidential information. If yo...{{dropped:21}}
More information about the R-help
mailing list