[R] Setting up hypothesis tests with the infer library?
Kevin Zembower
kev|n @end|ng |rom zembower@org
Sat Mar 29 17:09:33 CET 2025
Hello, all,
We're now starting to cover hypothesis tests in my Stats 101 course. As
usual in courses using the Lock5 textbook, 3rd ed., the homework
answers are calculated using their StatKey application. In addition
(and for no extra credit), I'm trying to solve the problems using R. In
the case of hypothesis test, in addition to manually setting up
randomized null hypothesis distributions and graphing them, I'm using
the infer library. I've been really impressed with this library and
enjoy solving this type of problem with it.
One of the first steps in solving a hypothesis test with infer is to
set up the initial sampling dataset. Often, in Lock5 problems, this is
a dataset that can be downloaded with library(Lock5Data). However,
other problems are worded like this:
===========================
In 1980 and again in 2010, a Gallup poll asked a random sample of 1000
US citizens “Are you in favor of the death penalty for a person
convicted of murder?” In 1980, the proportion saying yes was 0.66. In
2010, it was 0.64. Does this data provide evidence that the proportion
of US citizens favoring the death penalty was higher in 1980 than it
was in 2010? Use p1 for the proportion in 1980 and p2 for the
proportion in 2010.
============================
I've been setting up problems like this with code similar to:
===========================
df <- data.frame(
survey = c(rep("1980", 1000), rep("2010", 1000)),
DP = c(rep("Y", 0.66*1000), rep("N", 1000 - (0.66*1000)),
rep("Y", 0.64*1000), rep("N", 1000 - (0.64*1000))))
(d_hat <- df %>%
specify(response = DP, explanatory = survey, success = "Y") %>%
calculate(stat = "diff in props", order = c("1980", "2010")))
============================
My question is, is this the way I should be setting up datasets for
problems of this type? Is there a more efficient way, that doesn't
require the construction of the whole sample dataset?
It seems like I should be able to do something like this:
=================
(df <- data.frame(group1count = 660, #Or, group1prop = 0.66
group1samplesize = 1000,
group2count = 640, #Or, group2prop = 0.64
group2samplesize = 1000))
=================
Am I overlooking a way to set up these sample dataframes for infer?
Thanks for your advice and guidance.
-Kevin
More information about the R-help
mailing list