[R] Logistic regression : dicrepancies between glm and nls ?
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Dec 14 17:13:16 CET 2001
Your call to nls fits by least squares, whereas glm fits by maximum
likelihood. Not the same thing: ml gives more weights to values with
fitted values near zero or one.
On Fri, 14 Dec 2001, Emmanuel Charpentier wrote:
> Dear list,
>
> I'm trying to learn how to use nlme to be able to fit ad analyse
> mixed-model logistic regressions. In order to keep things simple, I
> started by the simplest possible model : a one (fixed-effect ...)
> continuous variable. This problem is, of course, solved by glm, but I
> wanted to look at a "hand-made" nls fit, in order to be able to
> "generalize" to nlme fits.
>
> > ## Let's start with the simplest model : one fixed-effect continuous
> variable
> >
> > ## Dummy data
> >
> > size<-500
> >
> > logdata<-data.frame(x=20*runif(size)-10) # A U([-10,10]) variable
> >
> > alpha<-0.5
> >
> > beta<-2
> >
> > ## Simulate a response
> > ## y : a boolean (0|1) with probability e^(a+bx)/(1+e^(a+bx))
> >
> > logdata$y<-as.numeric(runif(size)<(exp(alpha+beta*logdata$x)/
> + (1+exp(alpha+beta*logdata$x))))
> >
> > ## Realty check : are the data "reasonably random" ?
> >
> > table(logdata$y)
>
> 0 1
> 251 249
> >
> > by(logdata$x,logdata$y,summary)
> INDICES: 0
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> -9.993 -7.243 -4.594 -4.873 -2.678 1.081
> ------------------------------------------------------------
> INDICES: 1
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> -1.526 1.791 5.008 4.663 7.235 9.983
>
>
> So far, no reason to wail ...
>
> > ## another reality check : what's the "classical" logistic regression ?
> >
> > logdata.glm<-glm(y~x, data=logdata, family=binomial(link=logit))
> >
> > ## nls should give the same estimates, up to convergence discrepancies
> >
> > logdata.nls<-nls(y~exp(a+b*x)/(1+exp(a+b*x)), data=logdata,
> + start=list(a=0,b=1))
> >
> > ## let's see ...
> >
> > summary(logdata.glm)
>
> Call:
> glm(formula = y ~ x, family = binomial(link = logit), data = logdata)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -2.4394149 -0.0308105 -0.0001991 0.0082031 2.0389572
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) 0.9044 0.2849 3.174 0.00150 **
> x 1.8675 0.2739 6.818 9.2e-12 ***
> ---
> Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 693.14 on 499 degrees of freedom
> Residual deviance: 100.76 on 498 degrees of freedom
> AIC: 104.76
>
> Number of Fisher Scoring iterations: 8
>
> >
> > summary(logdata.nls)
>
> Formula: y ~ exp(a + b * x)/(1 + exp(a + b * x))
>
> Parameters:
> Estimate Std. Error t value Pr(>|t|)
> a 0.8979 0.1285 6.99 8.86e-12 ***
> b 1.5806 0.1474 10.73 < 2e-16 ***
> ---
> Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> Residual standard error: 0.1811 on 498 degrees of freedom
>
> Correlation of Parameter Estimates:
> a
> b 0.5964
>
>
> Hmmm ... the alpha estimators are quite close to each other, but the
> beta estimators are quite different. Furthermore, the standard errors
> are quite different.
>
> Further simulation work showed that :
> a) the alpha estimators can be much more different than they are in the
> present example ;
> b) the "biases" (differences between estimators) do *not* depend of
> initial values choosen for the nls estimation ;
> c) they depend on the size of the sample (the larger the sample, the
> spaller the "biases") ;
> d) the standard error estimates given by glm are lrger than those given
> by nls.
>
> Can someone explain to me why those two methods of fitting a (quite)
> simple model give so different results ?
>
> I *think* that two methods should end um with the same estimators (at
> least asymptotically). Where an why am I wrong ?
>
> Sincerely,
>
> Emmanuel Charpentier
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list