[R] Post Stratification

Thomas Lumley tlumley at u.washington.edu
Mon Jun 19 23:17:49 CEST 2006


On Sun, 18 Jun 2006, Mark Hempelmann wrote:

> Dear WizaRds,
>
> 	having met some of you in person in Vienna,  I think even more fondly
> of this community and hope to continue on this route. It was great
> talking with you and learning from you. Thank you. I am trying to work
> through an artificial example in post stratification. This is my dataset:
>
> library(survey)
> age <- data.frame(id=1:8, stratum=rep( c("S1","S2"),c(5,3)),
> weight=rep(c(3,4),c(5,3)), nh=rep(c(5,3),c(5,3)),
> Nh=rep(c(15,12),c(5,3)), y=c(23,25,27,21,22, 77,72,74) )
>
> pop.types <- table(stratum=age$stratum)

No, you need population totals here, that's the whole point of 
post-stratification.

> age.post <- svydesign(ids=~1, strata=NULL, data=age, fpc=~Nh)

Since the sampling isn't stratified you can't give stratum-specific fpc. 
This should give an error message. I will add one.

> post <- postStratify(design=age.post, strata=~stratum, population=pop.types)

Yes, but pop.types is defined wrongly

>
> pop.types <- data.frame(stratum = c("S1","S2"), Freq = c(15, 12))

This is the correct definition



> However, I compute:
> Nh=c(15,12); nh=c(5,3); sh=by(age$y, age$stratum, var); N=sum(Nh)
> # Mean estimator
> y.bar=by(age$y, age$stratum, mean) ## 23.6; 74.33
> estimator=1/N*sum(Nh*y.bar) ## 46.14815
> # Variance estimator
> vari=1/N^2*sum(Nh*(Nh-nh)*sh/nh)
> sqrt(vari) ##	.7425903
>
> and with Taylor expansion .7750118
>

This is partly the problem with giving stratum-specific fpc and partly 
because "survey" is using a less elegant but more general estimator (as is 
often the case).  The estimator is the one due to Valliant, described in 
the reference on the help page.

The primary difference from the estimator you given is that R does not try 
to compute finite population corrections for each post-stratum.  Another 
more subtle difference is that R's formula is estimating the conditional 
variance, conditional on the estimated weights, rather than the 
unconditional variance.


So, to do the computations correctly

> library(survey)
> age <- data.frame(id=1:8, stratum=rep( c("S1","S2"),c(5,3)),
+ weight=rep(c(3,4),c(5,3)), nh=rep(c(5,3),c(5,3)),
+ Nh=rep(c(15,12),c(5,3)), y=c(23,25,27,21,22, 77,72,74) )
>
> age$N<-rep(27,8)
>
> pop.types <- data.frame(stratum = c("S1","S2"), Freq = c(15, 12))
>

The sampling is unstratified from a population of size 27=15+12

> age.design <- svydesign(ids=~1, data=age, fpc=~N)
> age.post<-postStratify(age.design, strata=~stratum, population=pop.types)
>
> svymean(~y, age.design)
     mean     SE
y 42.625 7.8163
> svymean(~y, age.post)
     mean     SE
y 46.148 0.6737
>

Now, post-stratification adjusts the weights to sum to 15 and 12 for the 
two post-strata, so the mean estimate is a weighted average of the two 
post-stratum-specific means, weighted by 15/27 and 12/27. This is what you 
found.

The variance is the variance of the mean of the residuals after 
subtracting the post-stratum-specific mean.  That is, we can get the 
post-stratified estimates by

Adjust the weights
> age.design2 <- svydesign(ids=~1, data=age, fpc=~N,weight=~weight)
Create a 'residuals' variable
> age.design2<-update(age.design2, yres=y-ifelse(stratum=="S1", 
23.6,74.33333333))

Post-stratified mean is 46.148
> svymean(~y, age.design2)
     mean     SE
y 46.148 8.2317

Standard error of post-stratified mean is 0.6737
> svymean(~yres, age.design2)
            mean     SE
yres 1.4815e-09 0.6737


The formula for the variance of the estimated total is

> (n/(n-1))*((N-sum(nh))/N)*sum(yr2*(Nh/nh)^2)
[1] 330.915

which is formula (11) of the Valliant reference. The variance of the
mean is the same thing divided by N^2:

> (n/(n-1))*((N-sum(nh))/N)*sum(yr2*(Nh/nh)^2)/(N*N)
[1] 0.45393

where yr2<-by(yres^2,age$stratum,sum).  These agree exactly with R's 
estimates
> vcov(svytotal(~y, design=age.post))
         y
y 330.915
> vcov(svymean(~y, design=age.post))
         y
y 0.45393


 	-thomas



Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle



More information about the R-help mailing list