[R] OT: (quasi-?) separation in a logistic GLM

Gavin Simpson gavin.simpson at ucl.ac.uk
Tue Dec 16 16:01:24 CET 2008


On Tue, 2008-12-16 at 13:31 +0100, vito muggeo wrote:
> dear Gavin,
> I do not know whether such comment may be still useful..

Very much so, Thank you.

> 
> Why are you unsure about quasi-separation?
> I think that it is quite evident in the plot

Unsure in the sense that I had been unable to ascertain what
quasi-complete separation was ;-)

I'm still not convinced about the quasi-separation issue though. The
coefficients on the glm are large but the standard errors don't indicate
anything much wrong.

I tried brglm() in the package of the same name and this gave
effectively the same coefficients and standard errors as glm() where I
would have expected them to differ considerably if (quasi-)separation
were an issue. I'm not very familiar with the approach behind brglm()
however.

I'll take a look at the profiling you describe below also when our
computing problems here get sorted.

Apologies if people have had problems downloading the file from my web
space - we are having all sorts of filestore problems here this week.

Thanks again Vito for your comments,

G

> 
> plot(analogs ~ Dij, data = dat)
> 
> Also it may be useful to see the plot of the monotone (profile) deviance 
> (or the log-lik) for the coef of Dij,
> 
> xval<-seq(-20,0,l=50)
> ll<-vector(length=50)
> for(i in 1:length(xval)){
> mod <- glm(analogs ~ offset(xval[i]*Dij), data = dat, family = binomial)
> ll[i]<-mod$dev
> }
> 
> plot(xval, ll)
> 
> Hope this helps you,
> 
> vito
> 
> Gavin Simpson ha scritto:
> > Dear List,
> > 
> > Apologies for this off-topic post but it is R-related in the sense that
> > I am trying to understand what R is telling me with the data to hand.
> > 
> > ROC curves have recently been used to determine a dissimilarity
> > threshold for identifying whether two samples are from the same "type"
> > or not. Given the bashing that ROC curves get whenever anyone asks about
> > them on this list (and having implemented the ROC methodology in my
> > analogue package) I wanted to try directly modelling the probability
> > that two sites are analogues for one another for given dissimilarity
> > using glm().
> > 
> > The data I have then are a logical vector ('analogs') indicating whether
> > the two sites come from the same vegetation and a vector of the
> > dissimilarity between the two sites ('Dij'). These are in a csv file
> > currently in my university web space. Each 'row' in this file
> > corresponds to single comparison between 2 sites.
> > 
> > When I analyse these data using glm() I get the familiar "fitted
> > probabilities numerically 0 or 1 occurred" warning. The data do not look
> > linearly separable when plotted (code for which is below). I have read
> > Venables and Ripley's discussion of this in MASS4 and other sources that
> > discuss this warning and R (Faraway's Extending the Linear Model with R
> > and John Fox's new Applied Regression, Generalized Linear Models, and
> > Related Methods, 2nd Ed) as well as some of the literature on Firth's
> > bias reduction method. But I am still somewhat unsure what
> > (quasi-)separation is and if this is the reason for the warnings in this
> > case.
> > 
> > My question then is, is this a separation issue with my data, or is it
> > quasi-separation that I have read a bit about whilst researching this
> > problem? Or is this something completely different?
> > 
> > Code to reproduce my problem with the actual data is given below. I'd
> > appreciate any comments or thoughts on this.
> > 
> > #### Begin code snippet ################################################
> > 
> > ## note data file is ~93Kb in size
> > dat <- read.csv(url("http://www.homepages.ucl.ac.uk/~ucfagls/dat.csv"))
> > head(dat)
> > ## fit model --- produces warning
> > mod <- glm(analogs ~ Dij, data = dat, family = binomial)
> > ## plot the data
> > plot(analogs ~ Dij, data = dat)
> > fit.mod <- fitted(mod)
> > ord <- with(dat, order(Dij))
> > with(dat, lines(Dij[ord], fit.mod[ord], col = "red", lwd = 2))
> > 
> > #### End code snippet ##################################################
> > 
> > Thanks in advance
> > 
> > Gavin
> 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list