[R] Binomial glms with very small numbers

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Jan 15 09:15:32 CET 2004


On Wed, 14 Jan 2004, Spencer Graves wrote:

>       Yes, but "glm" maximizes the binomial likelihood assuming 
> log(p/(1-p)) is a linear model.  Therefore, you don't have to transform 
> the 0's and 1's.  There are cases where a particular combination of 
> potential explanatory variables will clearly separate mortalities from 
> survivors.  I don't know that the algorithm does with such cases, but it 
> should send a slope essentially to infinite.  However, if you don't have 
> this case, "glm" should do what you want. 

Even in that case glm will do what you want (return fitted probabilities 
rather close to 0 or 1), just somewhat inefficiently.  The standard errors 
are often hard to interpret and Wald tests are misleading. Even that case 
is covered in V&R!   We also discuss the problems of interpreting the 
residual deviance (in the current edition, and in the complements for 
earlier eds) when the expected number of either successes or failures is 
small (and what small is: it will be with n=3).

> 
>       hope this helps.  spencer graves
> 
> Patrick Connolly wrote:
> 
> >On Wed, 14-Jan-2004 at 05:15PM -0800, Spencer Graves wrote:
> >
> >|>       The advisability of using "glm" with mortality depends not on
> >|> the size of sample groups but on the assumption of independence:
> >|> Whether you have 3 individuals per group or 30 or 1, is it
> >
> >I think we can assume independence.  What concerned me more was the
> >fact that there will be rather a lot of 0s and 1s, corresponding to
> >-Inf and Inf on the transformed scale.  Only half the possible values
> >(namely, 1 & 2) will be usable in the fitting.  On second thoughts,
> >since the response can be given as a binary, perhaps I was
> >unnecessarily concerned.
> >
> >
> >|> plausible to assume that all individuals represented in your
> >|> data.frame have independent chances of survival give the
> >|> potentially explanatory variables?  If the answer is "yes", then
> >|> "glm" is appropriate.  If the answer is "no", then some other tool
> >|> may be preferable.  However, "glm" is quick and easy in R, and I
> >|> might start with that, even if I felt the assumption of
> >|> independence was violated.  If I found nothing there, I would not
> >|> likely find anything with techniques that handled more
> >|> appropriately the violations of independence.
> >
> >Thanks for that suggestion.
> >
> >|> 
> >|>       Similarly, I can't see how it would matter whether potentially 
> >|> explanatory variables were continuous or categorical, as long as a 
> >|> categorical variable were appropriately coded as a factor (or 
> >|> "character", which is then treated as a factor) if it has more than 2 
> >|> levels. 
> >
> >I didn't think it would make a difference but I included it in case
> >someone more knowledgeable had reasons why it did.
> >
> >Thanks.
> >
> >  
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list