[R] Chi-Square test and survey results
gheine at mathnmaps.com
gheine at mathnmaps.com
Tue Oct 11 21:31:46 CEST 2011
An organization has asked me to comment on the validity of their
recent all-employee survey. Survey responses, by geographic region,
compared
with the total number of employees in each region, were as follows:
> ByRegion
All.Employees Survey.Respondents
Region_1 735 142
Region_2 500 83
Region_3 897 78
Region_4 717 133
Region_5 167 48
Region_6 309 0
Region_7 806 125
Region_8 627 122
Region_9 858 177
Region_10 851 160
Region_11 336 52
Region_12 1823 312
Region_13 80 9
Region_14 774 121
Region_15 561 24
Region_16 834 134
How well does the survey represent the employee population?
Chi-square test says, not very well:
> chisq.test(ByRegion)
Pearson's Chi-squared test
data: ByRegion
X-squared = 163.6869, df = 15, p-value < 2.2e-16
By striking three under-represented regions (3,6, and 15), we get
a more reasonable, although still not convincing, result:
> chisq.test(ByRegion[setdiff(1:16,c(3,6,15)),])
Pearson's Chi-squared test
data: ByRegion[setdiff(1:16, c(3, 6, 15)), ]
X-squared = 22.5643, df = 12, p-value = 0.03166
This poses several questions:
1) Looking at a side-by-side barchart (proportion of responses vs.
proportion of employees, per region), the pattern of survey responses
appears, visually, to match fairly well the pattern of employees. Is
this a case where we trust the numbers and not the picture?
2) Part of the problem, ironically, is that there were too many
responses
to the survey. If we had only one-tenth the responses, but in the same
proportions by region, the chi-square statistic would look much better,
(though with a warning about possible inaccuracy):
data: data.frame(ByRegion$All.Employees, 0.1 *
(ByRegion$Survey.Respondents))
X-squared = 17.5912, df = 15, p-value = 0.2848
Is there a way of reconciling a large response rate with an
unrepresentative
response profile? Or is the bad news that the survey will give very
precise
results about a very ill-specified sub-population?
(Of course, I would put in softer terms, like "you need to assess the
degree
of homogeneity across different regions" .)
3) Is Chi-squared really the right measure of how representative is the
survey?
<<<<<<< >>>>>>>>>
Thanks for any help you can give - hope these questions make sense -
George H.
More information about the R-help
mailing list