[R] Design-consistent variance estimate
Stas Kolenikov
skolenik at gmail.com
Fri Aug 15 21:31:34 CEST 2008
Harold,
in design-based estimation, thinking in terms of "what is my
(effective) sample size" rarely works out.
First of all, unless you have a fixed sample size design, your sample
size itself is a random variable. You can hope for fixed sample sizes
with some excruciatingly controlled clinical studies, but with most
other surveys, you are at the mercy of non-response, unknown cluster
sizes, interviewer availability, all sorts of field problems. So in
ratio estimation (and estimation of the mean is ratio estimation,
mean[Y] = total[Y]/total[1]), your standard error should control for
randomness in the sample size, so your Taylor series linearization
formula should have the variance of the denominator, and then also
correlation between cluster totals of Y's and 1's.
Second, you probably have different cluster/PSU sizes. That's actually
what contributes to variability of total[1]. But at any rate that
variability invalidates simple formulae for balanced PSU sizes that
your code is using.
Third, at least theoretically, there might be finite population
corrections, although you don't seem to specify any in your svydesign
definition. And frankly I've never seen a survey where weights were
not needed.
If you want to take a look at some references, Korn & Graubard 1999
(http://www.citeulike.org/user/ctacmo/article/553280) might be a good
starting point, they have a pretty thorough discussion of issues with
variance estimation in cluster samples. A more technical reading is
Thompson 1997 (http://www.citeulike.org/user/ctacmo/article/1036973).
On 8/15/08, Doran, Harold <HDoran at air.org> wrote:
> Dear List:
>
> I am working to understand some differences between the results of the
> svymean() function in the survey package and from code I have written
> myself. The results from svymean() also agree with results I get from
> SAS proc surveymeans, so, this suggests I am misunderstanding something.
>
--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
More information about the R-help
mailing list