[R] Logistic regression : dicrepancies between glm and nls ?

Fri Dec 14 17:13:16 CET 2001

Your call to nls fits by least squares, whereas glm fits by maximum
likelihood.  Not the same thing: ml gives more weights to values with
fitted values near zero or one.

On Fri, 14 Dec 2001, Emmanuel Charpentier wrote:

> Dear list,
>
> I'm trying to learn how to use nlme to be able to fit ad analyse
> mixed-model logistic regressions. In order to keep things simple, I
> started by the simplest possible model : a one (fixed-effect ...)
> continuous variable. This problem is, of course, solved by glm, but I
> wanted to look at a  "hand-made" nls fit, in order to be able to
> "generalize" to nlme fits.
>
>  > ## Let's start with the simplest model : one fixed-effect continuous
> variable
>  >
>  > ## Dummy data
>  >
>  > size<-500
>  >
>  > logdata<-data.frame(x=20*runif(size)-10) # A U([-10,10]) variable
>  >
>  > alpha<-0.5
>  >
>  > beta<-2
>  >
>  > ## Simulate a response
>  > ## y : a boolean (0|1) with probability e^(a+bx)/(1+e^(a+bx))
>  >
>  > logdata$y<-as.numeric(runif(size)<(exp(alpha+beta*logdata$x)/
> +                                   (1+exp(alpha+beta*logdata$x))))
>  >
>  > ## Realty check : are the data "reasonably random" ?
>  >
>  > table(logdata$y)
>
>   0   1
> 251 249
>  >
>  > by(logdata$x,logdata$y,summary)
> INDICES: 0
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>  -9.993  -7.243  -4.594  -4.873  -2.678   1.081
> ------------------------------------------------------------
> INDICES: 1
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>  -1.526   1.791   5.008   4.663   7.235   9.983
>
>
> So far, no reason to wail ...
>
>  > ## another reality check : what's the "classical" logistic regression ?
>  >
>  > logdata.glm<-glm(y~x, data=logdata, family=binomial(link=logit))
>  >
>  > ## nls should give the same estimates, up to convergence discrepancies
>  >
>  > logdata.nls<-nls(y~exp(a+b*x)/(1+exp(a+b*x)), data=logdata,
> +                  start=list(a=0,b=1))
>  >
>  > ## let's see ...
>  >
>  > summary(logdata.glm)
>
> Call:
> glm(formula = y ~ x, family = binomial(link = logit), data = logdata)
>
> Deviance Residuals:
>        Min          1Q      Median          3Q         Max
> -2.4394149  -0.0308105  -0.0001991   0.0082031   2.0389572
>
> Coefficients:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)   0.9044     0.2849   3.174  0.00150 **
> x             1.8675     0.2739   6.818  9.2e-12 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
>     Null deviance: 693.14  on 499  degrees of freedom
> Residual deviance: 100.76  on 498  degrees of freedom
> AIC: 104.76
>
> Number of Fisher Scoring iterations: 8
>
>  >
>  > summary(logdata.nls)
>
> Formula: y ~ exp(a + b * x)/(1 + exp(a + b * x))
>
> Parameters:
>   Estimate Std. Error t value Pr(>|t|)
> a   0.8979     0.1285    6.99 8.86e-12 ***
> b   1.5806     0.1474   10.73  < 2e-16 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> Residual standard error: 0.1811 on 498 degrees of freedom
>
> Correlation of Parameter Estimates:
>        a
> b 0.5964
>
>
> Hmmm ... the alpha estimators are quite close to each other, but the
> beta estimators are quite different. Furthermore, the standard errors
> are quite different.
>
> Further simulation work showed that :
> a) the alpha estimators can be much more different than they are in the
> present example ;
> b) the "biases" (differences between estimators) do *not* depend of
> initial values choosen for the nls estimation ;
> c) they depend on the size of the sample (the larger the sample, the
> spaller the "biases") ;
> d) the standard error estimates given by glm are lrger than those given
> by nls.
>
> Can someone explain to me why those two methods of fitting a (quite)
> simple model give so different results ?
>
> I *think* that two methods should end um with the same estimators (at
> least asymptotically). Where an why am I wrong ?
>
> Sincerely,
>
>                             Emmanuel Charpentier
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._