[R] Fwd: add1() and anova() with glm with dispersion

Chad R. Bhatti bhatticr at yahoo.com
Wed Jun 28 22:34:04 CEST 2006


> Hello,
> 
> I have a question about a discrepancy between the
> reported F statistics using anova() and add1() from
> adding an additional term to form nested models. 
> 
> I found and old posting related to anova() and 
> drop1() regarding a glm with a dispersion parameter.
> 
> The posting is very old (May 2000, R 1.1.0).
> The old posting is located here.
> 
>
https://stat.ethz.ch/pipermail/r-devel/2000-May/020720.html
> 
>  
> I first noticed the discrepancy in the PC version of
> R
> 2.2.1.  I have upgraded to PC R 2.3.1 and the
> discrepancy remains.  
> 
> To be clear I will be as exact as possible and show
> the model output below.
> I am fitting nested models from the quasi-Poisson
> family using glm(). Here, model.0 is contained in
> model.1 where model.1 contains one additional
> covariate not in model.0.
>  
> To be specific I will show you model.0 and model.1.
> > > summary(model.0)
> > 
> > Call:
> > glm(formula = as.formula(formula.0), family =
> > quasipoisson, data = data.1)
> > 
> > Deviance Residuals: 
> >     Min       1Q   Median       3Q      Max  
> > -4.0256  -1.0080  -0.1604   0.7372   4.8545  
> > 
> > Coefficients:
> >               Estimate Std. Error t value Pr(>|t|)
>  
> >  
> > (Intercept)  9.440e-01  8.968e-02  10.526  < 2e-16
> > ***
> > I11         -6.857e-02  3.693e-02  -1.857 0.063468
> .
> >  
> > I12         -7.338e-02  4.142e-02  -1.771 0.076598
> .
> >  
> > I13          3.329e-02  4.130e-02   0.806 0.420208
>  
> >  
> > I14          1.830e-01  3.717e-02   4.924 9.00e-07
> > ***
> > I15          2.015e-01  3.480e-02   5.789 7.93e-09
> > ***
> > trades1      1.775e-01  2.114e-02   8.398  < 2e-16
> > ***
> > trades2      3.325e-02  1.857e-02   1.790 0.073493
> .
> >  
> > trades3      1.050e-01  1.873e-02   5.604 2.31e-08
> > ***
> > trades4      4.615e-02  1.853e-02   2.490 0.012827
> *
> >  
> > vol1         1.932e-04  5.116e-05   3.777 0.000162
> > ***
> > sVol1        1.211e-04  4.536e-05   2.670 0.007643
> > ** 
> > trades5      1.835e-02  1.850e-02   0.992 0.321297
>  
> >  
> > trades6      7.797e-03  1.837e-02   0.425 0.671213
>  
> >  
> > trades7      3.802e-02  1.831e-02   2.077 0.037916
> *
> >  
> > trades8      5.904e-02  1.831e-02   3.224 0.001280
> > ** 
> > trades9      4.581e-02  1.830e-02   2.503 0.012383
> *
> >  
> > trades10     5.521e-02  1.802e-02   3.063 0.002212
> > ** 
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
> '.'
> > 0.1 ' ' 1 
> > 
> > (Dispersion parameter for quasipoisson family
> taken
> > to
> > be 1.832780)
> > 
> >     Null deviance: 7020.8  on 2623  degrees of
> > freedom
> > Residual deviance: 4697.2  on 2606  degrees of
> > freedom
> > AIC: NA
> > 
> > Number of Fisher Scoring iterations: 4
> > 
> > 
> > 
> > 
> > > summary(model.1)
> > 
> > Call:
> > glm(formula = as.formula(formula.1), family =
> > quasipoisson, data = data.1)
> > 
> > Deviance Residuals: 
> >     Min       1Q   Median       3Q      Max  
> > -4.0807  -1.0061  -0.1659   0.7461   4.9228  
> > 
> > Coefficients:
> >               Estimate Std. Error t value Pr(>|t|)
>  
> >  
> > (Intercept)  9.627e-01  9.019e-02  10.675  < 2e-16
> > ***
> > I11         -6.631e-02  3.694e-02  -1.795 0.072713
> .
> >  
> > I12         -7.383e-02  4.141e-02  -1.783 0.074717
> .
> >  
> > I13          3.363e-02  4.129e-02   0.814 0.415467
>  
> >  
> > I14          1.844e-01  3.717e-02   4.960 7.50e-07
> > ***
> > I15          2.017e-01  3.479e-02   5.798 7.51e-09
> > ***
> > trades1      1.840e-01  2.140e-02   8.600  < 2e-16
> > ***
> > trades2      3.585e-02  1.861e-02   1.927 0.054135
> .
> >  
> > trades3      1.050e-01  1.872e-02   5.607 2.27e-08
> > ***
> > trades4      4.647e-02  1.852e-02   2.509 0.012166
> *
> >  
> > vol1         1.957e-04  5.111e-05   3.828 0.000132
> > ***
> > sVol1        1.238e-04  4.535e-05   2.730 0.006374
> > ** 
> > trades5      1.883e-02  1.849e-02   1.019 0.308506
>  
> >  
> > trades6      7.358e-03  1.836e-02   0.401 0.688663
>  
> >  
> > trades7      3.840e-02  1.829e-02   2.099 0.035928
> *
> >  
> > trades8      5.907e-02  1.831e-02   3.226 0.001272
> > ** 
> > trades9      4.527e-02  1.830e-02   2.473 0.013447
> *
> >  
> > trades10     5.582e-02  1.802e-02   3.098 0.001970
> > ** 
> > aveSpread1  -2.386e-01  1.238e-01  -1.928 0.054022
> .
> >  
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
> '.'
> > 0.1 ' ' 1 
> > 
> > (Dispersion parameter for quasipoisson family
> taken
> > to
> > be 1.831032)
> > 
> >     Null deviance: 7020.8  on 2623  degrees of
> > freedom
> > Residual deviance: 4690.4  on 2605  degrees of
> > freedom
> > AIC: NA
> > 
> > Number of Fisher Scoring iterations: 4
>  
> So model.1 is model.0 + aveSpread1.
> 
>  
> The R anova() function is supposed to account for a
> dispersion/scale parameter of a glm object and has
> the
> following output. The anova() function adds the
> terms
> sequentially and computes a test statistic. 
> 
> > > anova(model.1, test="F")
> > Analysis of Deviance Table
> > 
> > Model: quasipoisson, link: log
> > 
> > Response: trades
> > 
> > Terms added sequentially (first to last)
> > 
> > 
> >              Df Deviance Resid. Df Resid. Dev     
>  
> > F
> >    Pr(>F)    
> > NULL                          2623     7020.8     
>  
> >  
> >              
> > I11           1     91.5      2622     6929.3 
> > 49.9706
> > 1.996e-12 ***
> > I12           1    622.3      2621     6307.0
> > 339.8717
> > < 2.2e-16 ***
> > I13           1    614.4      2620     5692.6
> > 335.5561
> > < 2.2e-16 ***
> > I14           1     53.3      2619     5639.4 
> > 29.0903
> > 7.529e-08 ***
> > I15           1     64.6      2618     5574.8 
> > 35.2590
> > 3.270e-09 ***
> > trades1       1    535.2      2617     5039.6
> > 292.3131
> > < 2.2e-16 ***
> > trades2       1     49.0      2616     4990.5 
> > 26.7798
> > 2.454e-07 ***
> > trades3       1    113.6      2615     4876.9 
> > 62.0682
> > 4.830e-15 ***
> > trades4       1     31.3      2614     4845.6 
> > 17.0783
> > 3.700e-05 ***
> > vol1          1     20.8      2613     4824.8 
> > 11.3810
> > 0.0007528 ***
> > sVol1         1     13.9      2612     4810.9  
> > 7.5932
> > 0.0058997 ** 
> > trades5       1     10.6      2611     4800.3  
> > 5.7901
> > 0.0161861 *  
> > trades6       1      8.1      2610     4792.2  
> > 4.4175
> > 0.0356685 *  
> > trades7       1     24.7      2609     4767.5 
> > 13.4770
> > 0.0002464 ***
> > trades8       1     33.5      2608     4734.0 
> > 18.2926
> > 1.963e-05 ***
> > trades9       1     19.5      2607     4714.5 
> > 10.6649
> > 0.0011060 ** 
> > trades10      1     17.3      2606     4697.2  
> > 9.4474
> > 0.0021364 ** 
> > aveSpread1    1      6.8      2605     4690.4  
> > 3.7240
> > 0.0537449 .  
> > ---
>  
> 
> Now consider the output from add1() when adding
> aveSpread1 to model.0.
> 
> > > add1(model.0, ~ . + aveSpread1, test = "F")
> > Single term additions
> > 
> > Model:
> > trades ~ I11 + I12 + I13 + I14 + I15 + trades1 +
> > trades2 + trades3 + 
> >     trades4 + vol1 + sVol1 + trades5 + trades6 +
> > trades7 + trades8 + 
> >     trades9 + trades10
> >            Df Deviance F value   Pr(F)  
> > <none>          4697.2                  
> > aveSpread1  1   4690.4  3.7871 0.05176 .
> > 
> 
> 
> Here is the discrepancy. Compare F statistics
> for aveSpread1 3.7240 from anova() to 3.7871 from 
> add1().  
> 
> 
> I built my own anova-type table that did not account
> for the dispersion parameter and my last term
> (rounded) matches that reported by add1().  
> 
> > source('main.R')
>             [,1]                                    
>  
>                              
> model.table "Variable       Deviance       Resid Dev
>  
>    F              p-value   "
> add.term    "trades1        535.2          5039.6   
>  
>    277.84         0.0000    "
> add.term    "trades2        49.0           4990.5   
>  
>    25.69          0.0000    "
> add.term    "trades3        113.6          4876.9   
>  
>    60.92          0.0000    "
> add.term    "trades4        31.3           4845.6   
>  
>    16.86          0.0000    "
> add.term    "vol1           20.8           4824.8   
>  
>    11.28          0.0008    "
> add.term    "sVol1          13.9           4810.9   
>  
>    7.55           0.0061    "
> add.term    "trades5        10.6           4800.3   
>  
>    5.76           0.0164    "
> add.term    "trades6        8.1            4792.2   
>  
>    4.40           0.0360    "
> add.term    "trades7        24.7           4767.5   
>  
>    13.50          0.0002    "
> add.term    "trades8        33.5           4734.0   
>  
>    18.45          0.0000    "
> add.term    "trades9        19.5           4714.5   
>  
>    10.79          0.0010    "
> add.term    "trades10       17.3           4697.2   
>  
>    9.59           0.0020    "
> add.term    "aveSpread1     6.8            4690.4   
>  
>    3.79           0.0518    "
> 
> 
> McCullagh and Nelder (2nd ed) p207 #4 show an
> adjustment for the dispersion parameter in the
> F-statistic.  It does not seem that add1() is making
> this adjustment though the anova() help page
> suggests
> that anova() is.  The estimated dispersion
> parameters
> are s0 = 1.832780 and s1 = 1.831032.  I have tried
> scaling the F statistic from add1() by s1/s0, but it
> still is not equal to the F statistic from anova().
> 
> I am not experienced with the R source code.  I
> would
> like to know how add1() computes the F statistic
> with
> dispersion and how anova() computes the F statistic
> with dispersion and why they are different.
> 
> I apologize if I have missed something obvious, but
> add1() and anova() do not show the code in the PC
> version.  In the PC version if I type lm, glm,
> glm.fit, I see code and I do not for add1 and anova
> so
> I do not know how to examine these functions.
> 
> Thanks for all of your assistance.
> 
> Chad R. Bhatti
> 
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?

> protection around 
> http://mail.yahoo.com 
>



More information about the R-help mailing list