[R] warning associated with Logistic Regression

David Firth d.firth at warwick.ac.uk
Mon Jan 26 11:28:48 CET 2004

On Sunday, Jan 25, 2004, at 18:06 Europe/London, (Ted Harding) wrote:

> On 25-Jan-04 Guillem Chust wrote:
>> Hi All,
>> When I tried to do logistic regression (with high maximum number of
>> iterations) I got the following warning message
>> Warning message:
>> fitted probabilities numerically 0 or 1 occurred in: (if
>> (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
>> As I checked from the Archive R-Help mails, it seems that this happens
>> when the dataset exhibits complete separation.
> This is so. Indeed, there is a sense in which you are experiencing
> unusually good fortune, since for values of your predictors in one
> region you are perfectly predicting the 0s in your reponse, and for
> values in another region your a perfectly predicting the 1s. What
> better could you hope for?
> However, you would respond that this is not realistic: your variables
> are not (in real life) such that P(Y=1|X=x) is ever exactly 1 or
> exactly 0, so this perfect prediction is not realistic.
> In that case, you are somewhat stuck. The plain fact is that your
> data (in particular the way the values of the X variables are 
> distributed)
> are not adequate to tell you what is happening.
> There may be manipulative tricks (like penalised regression) which
> would inhibit the logistic regression from going all the way to a
> perfect fit; but, then, how would you know how far to let it go
> (because it will certainly go as far in that direction as you allow
> it to).
> The key parameter in this situation the dispersion parameter (sigma
> in the usual notation). When you get perfect fit in a "completely
> separated" situation, this corresponds to sigma=0. If you don't like
> this, then there must be reasons why you want sigma>0 and this may
> imply that you have reasons for wanting sigma to be at least s0 (say),
> or, if you are prepared to be Bayesian about it, you may be satisfied
> that there is a prior distribution for sigma which would not allow
> sigma=0, and would attach high probability to a range of sigma values
> which you condisder to be realistic.
> Unless you have a fairly firm idea of what sort of values sigma is
> likely to havem then you are indeed stuck because you have no reason
> to prefer one positive value of sigma to a different positive value
> of sigma. In that case you cannot really object if the logistic
> regression tries to make it as small as possible!

This seems arguable.  Accepting that we are talking about point 
estimation (the desirability of which is of course open to question!!), 
then old-fashioned criteria like bias, variance and mean squared error 
can be used as a guide.  For example, we might desire to use an 
estimation method for which the MSE of the estimated logistic 
regression coefficients (suitably standardized) is as small as 
possible; or some other such thing.

The simplest case is estimation of log(pi/(1-pi)) given an observation 
r from binomial(n,pi).  Suppose we find that r=n -- what then can we 
say about pi?  Clearly not much if n is small, rather more if n is 
large.  Better in terms of MSE than the MLE (whose MSE is infinite) is 
to use log(p/(1-p)), with p = (r+0.5)/(n+1).  See for example Cox & 
Snell's book on binary data.  This corresponds to penalizing the 
likelihood by the Jeffreys prior, a penalty function which has good 
frequentist properties also in the more general logistic regression 
context.  References given in the brlr package give the theory and some 
empirical evidence.  The logistf package, also on CRAN, is another 

I do not mean to imply that the Jeffreys-prior penalty will be the 
right thing for all applications -- it will not.  (eg if you really do 
have prior information, it would be better to use it.)

In general I agree wholeheartedly that it is best to get more/better 

> In the absence of such reasons,

All good wishes,

More information about the R-help mailing list