[R] logistic regression model specification
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Wed Nov 14 00:47:18 CET 2007
Prof Brian Ripley wrote:
> On Tue, 13 Nov 2007, Dylan Beaudette wrote:
>
>
>> Hi,
>>
>> I have setup a simple logistic regression model with the glm() function, with
>> the follow formula:
>>
>> y ~ a + b
>>
>> where:
>> 'a' is a continuous variable stratified by
>> the levels of 'b'
>>
>>
>> Looking over the manual for model specification, it seems that coefficients
>> for unordered factors are given 'against' the first level of that factor.
>>
>
> Only for the default coding.
>
>
>> This makes for difficult interpretation when using factor 'b' as a
>> stratifying model term.
>>
>
> Really? You realize that you have not 'stratified' on 'b', which would
> need the model to be a*b? What you have is a model with parallel linear
> predictors for different levels of 'b', and if the coefficients are not
> telling you what you want you should change the coding.
>
>
I have to differ slightly here. "Stratification", at least in the fields
that I connect with, usually means that you combine information from
several groups via an assumption that they have a common value of a
parameter, which in the present case is essentially the same as assuming
an additive model y~a+b.
I share your confusion as to why the parametrization of the effects of
factor b should matter, though. Surely, the original poster has already
noticed that the estimated effect of a is the same whether or not the
intercept is included? The only difference I see is that the running
anova() or drop1() would not give interesting results for the effect of
b in the no-intercept variation.
-p
> Much of what I am trying to get across is that you have a lot of choice as
> to how you specify a model to R. There has to be a default, which is
> chosen because it is often a good choice. It does rely on factors being
> coded well: the 'base level' (to quote ?contr.treatment) needs to be
> interpretable. And also bear in mind that the default choices of
> statistical software in this area almost all differ (and R's differs from
> S, GLIM, some ways to do this in SAS ...), so people's ideas of a 'good
> choice' do differ.
>
>
>> Setting up the model, minus the intercept term, gives me what appear to be
>> more meaningful coefficients. However, I am not sure if I am interpreting the
>> results from a linear model without an intercept term. Model predictions from
>> both specifications (with and without an intercept term) are nearly identical
>> (different by about 1E-16 in probability space).
>>
>> Are there any gotchas to look out for when removing the intercept term from
>> such a model?
>>
>
> It is just a different parametrization of the linear predictor.
> Anything interpretable in terms of the predictions of the model will be
> unchanged. That is the crux: the default coefficients of 'b' will be
> log odds-ratios that are directly interpretable, and those in the
> per-group coding will be log-odds for a zero value of 'a'. Does a zero
> value of 'a' make sense?
>
>
>> Any guidance would be greatly appreciated.
>>
>> Cheers,
>>
>>
>>
>
>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list