[R] glm() scale parameters and predicted Values

Ben Bolker bbolker at gmail.com
Thu Jul 14 16:52:34 CEST 2011


Peter Maclean <pmaclean2011 <at> yahoo.com> writes:

> 
> In glm() you can use the summary() function to recover
> the shape parameter (the reciprocal of the
> dispersion parameter). How do you recover the scale parameter? 
> Also, in the given example, how I estimate
> and save the geometric mean of the predicted values? 
> For a simple model you can use fitted() or predicted()
> functions. I will appreciate any help. 
>  
>  
>  
> #Call required R packages
> require(plyr)  
> require(stats) 
> require(fitdistrplus)
> require(MASS)
> #Grouped vector
> n <- c(1:10)
> yr <-c(1:10)
> ny <- list(yr=yr,n=n)
> require(utils)
> ny <- expand.grid(ny) 
> y = rgamma(100, shape=1.5, rate = 1, scale = 2)

   It's a bit odd to specify both the rate and scale parameters,
which are redundant.  I would have guessed that the
rate parameter would dominate, but it looks like the
scale parameter dominates:

> set.seed(1001); rgamma(1,shape=1,rate=2)
[1] 1.622577
> set.seed(1001); rgamma(1,shape=1,scale=2)
[1] 6.490306
> set.seed(1001); rgamma(1,shape=1,rate=2,scale=2)
[1] 6.490306
> set.seed(1001); rgamma(1,shape=1,scale=2,rate=2)
[1] 6.490306

  (I know, I could go look at the source, but it's
a .Internal() function and I'm feeling lazy ...)

> Gdata <- cbind(ny,y)
> Gdata2<- Gdata
> Gdata$x1 <- cos((3.14*yr)/365.25) 
> Gdata$x2 <- sin((3.14*yr)/365.25) 
> #Fitting Generalized Linear Models 
> Gdata <- split(Gdata,Gdata$n)
> FGLM <- lapply(Gdata, function(x){
>               m <- as.numeric(x$y)
>               x1 <- m <- as.numeric(x$x1)
>               x2 <- m <- as.numeric(x$x2)
>               summary(glm(m~1+x1+x2, family=Gamma),dispersion=NULL) 
>                })
> 

  Note that you have simulated a null model (the data don't
depend on the covariates).

> #Save the results of the estimated parameters
> str(FGLM,no.list = TRUE)
> SFGLMC<- ldply(FGLM, function(x) x$coefficients)
> SFGLMD<- ldply(FGLM, function(x) x$dispersion)
> GLM <- cbind(SFGLMC,SFGLMD)

  Which scale parameter do you mean?  In a real
model it will vary with x1 and x2.  Let's try an
experiment first:

> set.seed(1001)
> z <- rgamma(10000,shape=2,scale=3)
> g1 <- glm(z~1,family=Gamma)
> summary(g1)

Call:
glm(formula = z ~ 1, family = Gamma)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8434  -0.6579  -0.1702   0.3081   2.6220  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.167013   0.001189   140.5   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for Gamma family taken to be 0.5066566)

    Null deviance: 5419.8  on 9999  degrees of freedom
Residual deviance: 5419.8  on 9999  degrees of freedom
AIC: 53526

  Here the intercept estimate is 0.167 , which is very
nearly 1/6 = 1/(shape*scale) -- i.e. the Gamma GLM is
parameterized in terms of the mean (on the inverse
scale).  If you want to recover
the scale parameter for the intercept case, then

summary(g1)$dispersion/coef(g1)[1]

should be pretty good.

As for the geometric means -- that's pretty tricky.
*If* you used a log link, then the predicted values on
the link scale (i.e. predict(g1,type="link")) would indeed
give you the geometric means.  On the inverse scale, though,
you would have to do a bit of finagling.



More information about the R-help mailing list