[R] power and sample size for a GLM with Poisson response variable
Ken Beath
kbeath at efs.mq.edu.au
Thu Mar 9 13:18:41 CET 2006
The problem with the simulation is with the second group where there
is a high probability of obtaining all zeroes for the sample and this
is causing problems with the Wald statistics you are using to check
for a difference. Here is an example.
Browse[1]> summary(res)
Call:
glm(formula = y[i, ] ~ group, family = poisson())
Deviance Residuals:
Min 1Q Median 3Q Max
-1.290e-01 -1.290e-01 -1.231e-05 -1.231e-05 2.756e+00
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.7884 0.3333 -14.365 <2e-16 ***
group2 -18.5142 1235.1829 -0.015 0.988
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 110.881 on 4260 degrees of freedom
Residual deviance: 86.192 on 4259 degrees of freedom
AIC: 108.19
Number of Fisher Scoring iterations: 21
On Wed, 8 Mar 2006 Wassell, James T., Ph.D. wrote:
> Subject: Re: [R] power and sample size for a GLM with Poisson response
> variable
> To: <r-help at stat.math.ethz.ch>
> Cc: "Craig A. Faulhaber" <caf at gis.usu.edu>
> Message-ID:
> <AF2DCD619279544BA454141F4A45B9E306A3E8 at m-niosh-3.niosh.cdc.gov>
> Content-Type: text/plain; charset="US-ASCII"
>
> Craig, Thanks for your follow-up note on using the asypow package. My
> problem was not only constructing the "constraints" vector but, for my
> particular situation (Poisson regression, two groups, sample sizes of
> (1081,3180), I get very different results using asypow package
> compared
> to my other (home grown) approaches.
>
> library(asypow)
> pois.mean<-c(0.0065,0.0003)
> info.pois <- info.poisson.kgroup(pois.mean, group.size=c(1081,3180))
> constraints <- matrix(c(2,1,2), ncol = 3, byrow=T)
> poisson.object <- asypow.noncent(pois.mean, info.pois,constraints)
> power.pois <- asypow.power(poisson.object, c(1081,3180), 0.05)
> print(power.pois)
>
> [1] 0.2438309
>
> asy.pwr() # the function is shown below.
>
> $power
> [1] 0.96
>
> sim.pwr() # the function is shown below.
> 4261000 Poisson random variates simulated
> $power
> [1] 0.567
>
> I tend to think the true power is greater than 0.567 but less than
> 0.96
> (not as small as 0.2438309).
>
> Maybe I am still not using the asypow functions correctly?
> Suggestions/comments welcome.
>
>
> -----Original Message-----
> From: Craig A. Faulhaber [mailto:caf at gis.usu.edu]
> Sent: Saturday, March 04, 2006 12:04 PM
> To: Wassell, James T., Ph.D.
> Subject: Re: [R] power and sample size for a GLM with Poisson response
> variable
>
> Hi James,
>
> Thanks again for your help. With the assistance of a statistician
> colleauge of mine, I figured out the "constraints" vector. It is a
> 3-column vector describing the null hypothesis. For example, let's
> say
> you have 3 populations and your null hypothesis is that there is no
> difference between the 3. The first row of the matrix would be "3
> 1 2"
> indicating that you have 3 populations and that populations 1 and 2
> are
> equal. The second row would be "3 2 3" indicating that you have 3
> populations and that populations 2 and 3 are equal. If you had only 2
> populations, there would be only one row ("2 1 2", indicating 2
> populations with 1 and 2 equal). I hope this helps.
>
> Craig
>
> Wassell, James T., Ph.D. wrote:
>
>> Craig, I found the package asypow difficult to use and it did not
>> yield results in the ballpark of other approaches. (could not
>> figure
>
>> out the "constraints" vector).
>>
>> I wrote some simple functions, one asy.pwr uses the non-central
>> chi-square distn.
>>
>> asy.pwr<-function(counts=c(7,1),py=c(1081,3180))
>> { # a two group Poisson regression power computation # py is
>> person-years or person-time however measured
>> group<-gl(2,1)
>> ncp<-summary(glm(counts~group+offset(log(py)),family=poisson))
>> $null.de
>> viance
>>
>> q.tile<-qchisq(.95,1) # actually just the X2 critical value of
>> 3.841459 list(power=round(1-pchisq(q.tile,df=1,ncp),2))}
>>
>> The second function, sim.pwr, estimates power using simulated
>> Poisson
>
>> random variates. The most time consuming step is the call to rpois.
>
>> (Maybe someone knows a more efficient way to accomplish this?). The
>> "for" loop is rather quick in comparison. I hope you may find this
>> helpful, or if you have solved your problem some other way, please
>> pass along your approach. Note, that for this problem, very small
>> values of lambda, the two approaches give much different power
>> estimates (96% vs. 55% or so). My problem may be better addressed
>> as binomial logistic regression, maybe then the simulation and the
>> asymptotic estimates my agree better.
>>
>> sim.pwr<-function(means=c(0.0065,0.0003),ptime=c
>> (1081,3180),nsim=1000)
>> { # a two group poisson regression power computation # based
>> simulating lots of Poisson r.v.'s # input rates followed by a vector
>> of the corresponding person times # the most time consuming part is
>> the r.v. generation.
>> # power is determined by counting the how often p-values are <= 0.05
>> group<-as.factor(rep(c(1,2),ptime))
>> rej<-vector(length=nsim)
>> y<-rpois(ptime[1]*nsim,means[1])
>> y<-c(y,rpois(ptime[2]*nsim,means[2]))
>> y<-matrix(y,nrow=nsim)
>> cat(sum(ptime)*nsim,"Poisson random variates simulated","\n") for
>> (i in
>
>> 1:nsim){res<-glm(y[i,]~group,family=poisson())
>> rej[i]<-summary(res)$coeff[2,4]<=0.05}
>> list(power=sum(rej)/nsim) }
>>
>>
>>
>>
>
More information about the R-help
mailing list