[R] Odd results with Chi-square test. (Not an R problem, but general statistics, I think.)

Tue Aug 18 16:29:03 CEST 2009

Hi,

I am working on a system which automatically answers user questions (such
systems are commonly called "Question Answering systems"). I evaluated
different versions of the same system on a publicly available test sets.
Naturally, there is a fixed number of questions in the test set, and the
system answers some right and some wrong.

I want to compare each version of the system against a baseline and see
whether the increase is statistically significant. I used one-tailed chi
square tests for this. 

Here's the data I got:

Test set 1:
              total   incorrect correct  p
baseline   1908   1718       190 
version_1 1908   1698       210      0,145
version_2 1908   1690       218      0,071
version_3 1908   1677       231      0,017

I compared every version with the baseline, so that I get something like a
2x2 contingency table, as here: 

                incorrect correct  
baseline     1718      190
version_1   1698      210      

p: 0,145

This works fine, the results seem to make sense intuitively.

First question:
Do you think this is a legitimate way to compute significance?

But then I also have figures on *partial* test sets, because there are some
questions for which we just cannot expect the system to return correct
answers. (The reason for this is beyond the scope of this post.) So
different versions of the system work on test sets of different sizes. Then
we get: 

Test set 2:
                total    incorrect  correct   p
baseline     898      708         190 
version_1   898      688         210       0,128
version_2   898      680         218       0,057
version_3   1021    790          231      0,219

Here, the p value for version_3 (when compared with the baseline) seems to
make no sense whatsoever. It shouldn't be larger that the other two p
values, the increase in correct answers (that is what counts!) is bigger
after all.

Any idea what's going on here? I thought the sample size should have no
impact on the results?  

Thanks a lot,
Mika
-- 
View this message in context: http://www.nabble.com/Odd-results-with-Chi-square-test.-%28Not-an-R-problem%2C-but-general-statistics%2C-I-think.%29-tp25026167p25026167.html
Sent from the R help mailing list archive at Nabble.com.