[R] logistic regression model specification
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Nov 13 23:28:00 CET 2007
On Tue, 13 Nov 2007, Dylan Beaudette wrote:
> Hi,
>
> I have setup a simple logistic regression model with the glm() function, with
> the follow formula:
>
> y ~ a + b
>
> where:
> 'a' is a continuous variable stratified by
> the levels of 'b'
>
>
> Looking over the manual for model specification, it seems that coefficients
> for unordered factors are given 'against' the first level of that factor.
Only for the default coding.
> This makes for difficult interpretation when using factor 'b' as a
> stratifying model term.
Really? You realize that you have not 'stratified' on 'b', which would
need the model to be a*b? What you have is a model with parallel linear
predictors for different levels of 'b', and if the coefficients are not
telling you what you want you should change the coding.
Much of what I am trying to get across is that you have a lot of choice as
to how you specify a model to R. There has to be a default, which is
chosen because it is often a good choice. It does rely on factors being
coded well: the 'base level' (to quote ?contr.treatment) needs to be
interpretable. And also bear in mind that the default choices of
statistical software in this area almost all differ (and R's differs from
S, GLIM, some ways to do this in SAS ...), so people's ideas of a 'good
choice' do differ.
> Setting up the model, minus the intercept term, gives me what appear to be
> more meaningful coefficients. However, I am not sure if I am interpreting the
> results from a linear model without an intercept term. Model predictions from
> both specifications (with and without an intercept term) are nearly identical
> (different by about 1E-16 in probability space).
>
> Are there any gotchas to look out for when removing the intercept term from
> such a model?
It is just a different parametrization of the linear predictor.
Anything interpretable in terms of the predictions of the model will be
unchanged. That is the crux: the default coefficients of 'b' will be
log odds-ratios that are directly interpretable, and those in the
per-group coding will be log-odds for a zero value of 'a'. Does a zero
value of 'a' make sense?
> Any guidance would be greatly appreciated.
>
> Cheers,
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list