[R] bigglm() results different from glm()
Francisco J. Zagmutt
gerifalte28 at hotmail.com
Tue Mar 17 03:26:29 CET 2009
Dear all,
I am using the bigglm package to fit a few GLM's to a large dataset (3
million rows, 6 columns). While trying to fit a Poisson GLM I noticed
that the coefficient estimates were very different from what I obtained
when estimating the model on a smaller dataset using glm(), I wrote a
very basic toy example to compare the results of bigglm() against a
glm() call. Consider the following code:
> require(biglm)
> options(digits=6, scipen=3, contrasts = c("contr.treatment",
"contr.poly"))
> dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
ttment=gl(2,50000))
> m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
> m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"))
> summary(m1)
<snipped output for this email>
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.30305 0.00141 1629 <2e-16 ***
ttment2 0.40429 0.00183 221 <2e-16 ***
Null deviance: 151889 on 99999 degrees of freedom
Residual deviance: 101848 on 99998 degrees of freedom
AIC: 533152
> summary(m1big)
Large data regression model: bigglm(y ~ ttment, data = dat, family =
poisson(link = "log"))
Sample size = 100000
Coef (95% CI) SE p
(Intercept) 2.651 2.650 2.653 0.001 0
ttment2 4.346 4.344 4.348 0.001 0
> m1big$deviance
[1] 287158986
Notice that the coefficients and deviance are quite different in the
model estimated using bigglm(). If I change the chunk to
seq(1000,10000,1000) the estimates remain the same.
Can someone help me understand what is causing these differences?
Here is my version info:
> version
_
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 2
minor 8.1
year 2008
month 12
day 22
svn rev 47281
language R
version.string R version 2.8.1 (2008-12-22)
Many thanks in advance for your help,
Francisco
--
Francisco J. Zagmutt
Vose Consulting
2891 20th Street
Boulder, CO, 80304
USA
www.voseconsulting.com
More information about the R-help
mailing list