[R] compare proportions

array chip arrayprofile at yahoo.com
Tue Sep 27 21:47:05 CEST 2011


Hi, I have a seemingly simple proportional test.  here is the question I am trying to answer:
 
There is a test running each day in the lab, the test comes out as
either positive or negative. So at the end of each month, we can calculate a
positive rate in that month as the proportion of positive test results. The
data look like:
 
Month      # positive       # total tests    positive rate
January          24                 205                  11.7%
February        31                234                  13.2%
March             26               227                 11.5%
:
:
:
August             42                241                 17.4%
 
The total # of positive before August is 182, and the total # of tests
before August is 1526. It appears that from January to July, the positive rate
is between 11% to 13%, the rate in August is up around 17%. So the question is
whether is up in August is statistically significant?
 
I can think of 3 ways to do this test:
 
1.1. Use binom.test(), set “p” as the average positive
rate between January and July (=182/1526):
 
> binom.test(42,241,182/1526)
 
        Exact binomial test
 
data:  42 and 241 
number of successes = 42, number
of trials = 241, p-value = 0.0125
alternative hypothesis: true
probability of success is not equal to 0.1192661 
95 percent confidence interval:
 0.1285821 0.2281769 
sample estimates:
probability of success 
             0.1742739
 
2. 2. Use prop.test(), where I compare the average
positive rate between January & July with the positive rate in August:
 
> prop.test(c(182,42),c(1526,241))
 
        2-sample test for equality of
proportions with continuity correction
 
data:  c(182, 42) out of c(1526, 241) 
X-squared = 5.203, df = 1,
p-value = 0.02255
alternative hypothesis:
two.sided 
95 percent confidence interval:
 -0.107988625 -0.002026982 
sample estimates:
   prop 1    prop 2 
0.1192661 0.1742739
3.       
2. 3. Use prop.test(), where I compare the average
monthly positive rate between January & July with the positive rate in
August. The average monthly # of positives is 182/7=26, the average monthly $
of total tests is 1526/7=216:
 
> prop.test(c(26,42),c(218,241))
 
        2-sample test for equality of
proportions with continuity correction
 
data:  c(26, 42) out of c(218, 241) 
X-squared = 2.3258, df = 1,
p-value = 0.1272
alternative hypothesis:
two.sided 
95 percent confidence interval:
 -0.12375569  0.01374008 
sample estimates:
   prop 1    prop 2 
0.1192661 0.1742739
 
As you can see that the method 3 gave insignificant p value compared to
method 1 & 2. While I understand each method is testing different hypothesis,
but for the question I am trying to answer (does August have higher positive
rate compare to earlier months?), which method is more relevant?
 
Thanks for any suggestion,
 
John



More information about the R-help mailing list