[R] Sample size calculation for non-normal population with unknown mean and SD
Bert Gunter
gunter.berton at gene.com
Mon Jul 26 22:39:09 CEST 2010
The obvious:
Take a small sample, say 25-50. Get an estimate of your distribution
from that. Then use this to determine how many more (if any)
additional samples you need for desired precision. This latter can
probably easily be done via simulation/bootstrap if you don't want to
specify a parametric form.
My guess is that your distribution is right-skew but not Poisson --
probably more like a truncated Poisson. But of course I have no idea
what sorts of documents you've got, so how would I know?
> Basically, we have a population of 4,392 documents and we want to find out
> the number of patents per document. We don’t want to go through all 4,392
> documents, but want a reliable sample size from which to draw inferences. I
> feel like this count data will not follow a normal distribution, but more
> like a Poisson (skewed right.) The problem is we don’t have much similar
> data to this data set, so mean and standard deviation are unknown. Is there
> any way to derive a sample size based off the confidence interval, margin of
> error, and population size for what I assume to be a non-normal population?
> Any help would be greatly appreciated.
