[R] glm or transformation of the response?

Tue Nov 25 17:33:16 CET 2008

Dear Peter,

But even if I change things as you suggested, the question still remains the same: why do the glm 
models perform so poorly on this dataset? And what would your advice to the students be?

Best wishes
Christoph

# by the way, I disagree on taking logs on the explanatory; the explanatory is fixed and equally 
spaced, so I see no reason of transforming it (this produces errors in the glm.nb fit, for example).

plot(explanatory,response,pch=16)
lines(explanatory,predict(model1,data.frame(explanatory=explanatory)))
lines(explanatory,(predict(model2,data.frame(explanatory=explanatory)))^2-0.5,lty=2)
lines(explanatory,exp(predict(model3,data.frame(explanatory=explanatory)))-1,lty=3)
lines(explanatory,exp(predict(model5,data.frame(explanatory=explanatory))),lty=1,col="red")
lines(explanatory,predict(model6,data.frame(explanatory=explanatory),type="response"),lty=1,col="blue")
lines(explanatory,predict(model7,data.frame(explanatory=explanatory),type="response"),lty=1,col="green")

Peter Dalgaard schrieb:
> Christoph Scherber wrote:
>> Dear all,
>>
>> For an introductory course on glm?s I would like to create an example to
>> show the difference between glm and transformation of the response. For
>> this, I tried to create a dataset where the variance increases with the
>> mean (as is the case in many ecological datasets):
>>
>>     poissondata=data.frame(
>>     response=rpois(40,1:40),
>>     explanatory=1:40)
>>
>>     attach(poissondata)
>>
>> However, I have run into a problem because it looks like the lm model
>> (with sqrt-transformation) fits the data best:
>>
>> ##
>>
>> model1=lm(response~explanatory,poissondata)
>> model2=lm(sqrt(response+0.5)~explanatory,poissondata)
>> model3=lm(log(response+1)~explanatory,poissondata)
>> model4=glm(response~explanatory,poissondata,family=poisson)
>> model5=glm(response~explanatory,poissondata,family=quasipoisson)
>> model6=glm.nb(response~explanatory,poissondata)
>> model7=glm(response~explanatory,quasi(variance="mu",link="identity"))
>>
>>
>> plot(explanatory,response,pch=16)
>> lines(explanatory,predict(model1,explanatory=explanatory))
>> lines(explanatory,(predict(model2,explanatory=explanatory))^2-0.5,lty=2)
>> lines(explanatory,exp(predict(model3,explanatory=explanatory))-1,lty=3)
>> lines(explanatory,exp(predict(model5,explanatory=explanatory)),lty=1,col="red")
>>
>> lines(explanatory,predict(model6,explanatory=explanatory,type="response"),lty=1,col="blue")
>>
>> lines(explanatory,predict(model7,explanatory=explanatory,type="response"),lty=1,col="green")
>>
>>
>> ##
>>
>> The only model that performs equally well is model7.
>>
>> How would you deal with this kind of analysis? What would be your
>> recommendation to the students, given the fact that most of the standard
>> glm models obviously don?t seem to produce good fits here?
>>
>> Many thanks and best wishes
>> Christoph
>>
>> (using R 2.8.0 on Windows XP)
>>
> 
> Any good reason that you're not transforming both sides when
> transforming? and that you're not looking at
> 
> model8 <- glm(response~log(explanatory),poissondata,family=poisson)
> (etc.)
> 
> ??
> 
> BTW, your predict call seems to be missing data.frame() around
> "explanatory=explanatory". The predict() methods do not have an argument
> called "explanatory", so this is just ignored (a buglet if you ask me).
>