[R] strange fisher.test result
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Tue Apr 3 18:15:44 CEST 2007
Thomas Lumley wrote:
> On Mon, 2 Apr 2007, ted.harding at nessie.mcc.ac.uk wrote:
>
>>> From the above, the marginal totals for his 2x2 table
>>>
>> a b = 16 8
>>
>> c d 15 24
>>
>> are (rows then columns) 24,39,31,32
>>
>> These fixed marginals mean that the whole table is determined
>> by the value of a. The following function P.FX() computes the
>> probabilities of all possible tables, conditional on the
>> marginal totals (it is much more transparent than the code
>> for the same purpose in fisher.test()):
>>
>
> As this example has shown, 2x2 tables are a nice opportunity for
> illustrating how the ordering of the sample space affects inference
> (because you can actually see the whole sample space).
>
> I used something like this as a term project in an introductory R class,
> where we wrote code to compute the probabilities for all outcomes
> conditional on one margin, and used this to get (conservative) exact
> versions of all the popular tests in 2x2 tables. It's interesting to do
> things like compare the rejection regions and power under various
> alternatives for the exact versions of the likelihood ratio test and
> Fisher's test. We didn't get as far as confidence intervals, but the code
> is at
> http://faculty.washington.edu/tlumley/b514/exacttest.R
> with .Rd files at
> http://faculty.washington.edu/tlumley/b514/man/
>
The effect is already visible with binomial tests. In fact the last
exercise in the section on categorical data in Introductory Statistics
with R currently reads (the \Answer section is not in the actual book --
yet):
Make a plot of the two-sided $p$ value for
testing that the probability parameter is $x$ when the observations
are 3 successes in 15 trials, for $x$ varying from 0 to 1 in steps of
0.001. Explain what makes the definition of a two-sided confidence
interval difficult.
\Answer The curve shows substantial discontinuities where
probability mass is shifted from one tail to the other, and also a
number of local minima. A confidence region could be defined as
those $p$ that there is no significant evidence against at level
$\alpha$, but for some $\alpha$, that is not an interval.
p <- seq(0,1,0.001)
pval <- sapply(p,function(p)binom.test(3,15,p=p)$p.value)
plot(p,pval,type="l")
More information about the R-help
mailing list