[R] Question on binomial data
David Winsemius
dwinsemius at comcast.net
Wed Apr 22 01:04:44 CEST 2009
Surely Faraway does not suggest using the Wald statistic in preference
to the deviance?
Even if the distribution of deviance is not exactly chi-square, it
appears generally accepted that a comparison of the difference in
deviance to the chi-square statistic is better than using the ratio of
the beta to se(beta) which is what that "Pr(>|z|)" number is.
Your permutation results look sensible and could conceivably be
considered the gold standard.
--
David
On Apr 21, 2009, at 5:31 PM, ehud cohen wrote:
> I thought of testing the difference in deviance between the null model
> and the fitted model, assuming it is distributed as chi-sq. However,
> Faraway writes that if the outcome is binary, the deviance
> distribution is far from chisq.
> I've done a permutation test:
>
> N<-5000; # Towards the upper limit, as there are only 17 over 5 =
> 6,188 combination of the T/F data I have..
> dev<-rep(0,N);
> for (i in 1:N) {
> l1<-glm(sample(p)~w,family=binomial);
> dev[i]<-l1$dev;
> }
> print(mean(dev<l$dev))
>
> and the outcome is 0.005 - which is close to the ttest.
>
> I've repeated the same with calculating the statistics on the z-value
> in summary(l1) each time instead of the deviance, and got a comparable
> result.
>
> I think it means that David is right, the Pr(>|z|) in glm output does
> not mean much. I still don't know what does it mean.
>
> Regarding your suggestion of using car's Anova:
>
>> Anova(l)
> Anova Table (Type II tests)
>
> Response: p
> LR Chisq Df Pr(>Chisq)
> w 9.4008 1 0.002169 **
>
> which is identical to:
>
> pchisq(l$null.deviance-l$dev,1,lower=F)
>
> which seems to be too low - which is probably due to the binary
> response.
>
> would you think the permutation method is appropriate to use in this
> case? and extended also to a case with several covariates?
>
>
>
> On Tue, Apr 21, 2009 at 10:34 PM, <markleeds at verizon.net> wrote:
>> hi: i would wait for one of the guRus to say something but my take
>> ( take it
>> with a grain of salt ) is that the results
>> are not so contradictory. the test of the significance of the
>> coefficient in
>> the GLM is 0.06. and the test that the
>> means are difference gives a pv-pvalue of 0.004. a couple of
>> reasons why
>> this might not be so contradictory:
>>
>> A) the test gives greater significance in the t-test case but it's
>> not
>> really testing the same thing. the t-test is only testing that
>> the means are different. the glm is testing is that log odds of
>> the means
>> of the two events ( pass and fail ) are linearly related to
>> a covariate.
>>
>> b) your t-test is a little weird because it's only got sample of
>> five in
>> one of the 2 samples and I'm not clear on whether it's assuming equal
>> variances and then pooling ( I think there's a pooled = TRUE option
>> for
>> t.test but I don't know the default value ).
>> definitely that's not a large sample size regardless of the pooling
>> issue.
>>
>> c) when you test the significance in a glm you need to compare the
>> deviance
>> of the model to the deviance of the nested null model.
>> John Fox's book desacribes this but I don't think it's the same as
>> looking
>> as the significance in the table output of glm. that's
>> a wald test and not the same as the deviance comparison
>> ( essentially a
>> likelihood ratio test i think ). with small sample sizes, i think
>> these
>> differences between these various test can be large. check out john
>> fox's
>> text for a nice description of testing in the generalized linear
>> model
>> framework. you can use Anova from his car package to do this.
>>
>> hopefully someone else wil say something though because i'd be
>> curious to
>> see where i'm wrong/right or something new.
>> good luck.
>>
>>
>>
>>
>>
>>
>>
>> On Apr 21, 2009, ehud cohen <ehudco.list at gmail.com> wrote:
>>
>> Hi,
>>
>> We have an experiment with pass/fail outcome, and a continuous
>> parameter which may contribute to the outcome.
>>
>> First, we've analyzed it by:
>>
>> p=c(F,T,F,F,F,T,T,T,T,T,T,T,F,T,T,T,T);
>> w=c(53,67,59,59,53,89,72,56,65,63,62,58,59,72,61,68,63);
>> l<-glm(p~w,family=binomial)
>> summary(l)
>>
>> Which turned out to be non significant.
>>
>> Then, we thought of comparing the parameters of the two groups
>> (passed
>> vs. failed)
>>
>> t.test(w[which(p)],w[which(!p)],alternative="two.sided")
>>
>> which turned highly significant.
>>
>> I'd appreciate some insight...
>>
>> Thanks, Ehud.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list