[R] Setting up hypothesis tests with the infer library?

Sat Mar 29 17:42:59 CET 2025

Às 16:09 de 29/03/2025, Kevin Zembower via R-help escreveu:
> Hello, all,
> 
> We're now starting to cover hypothesis tests in my Stats 101 course. As
> usual in courses using the Lock5 textbook, 3rd ed., the homework
> answers are calculated using their StatKey application. In addition
> (and for no extra credit), I'm trying to solve the problems using R. In
> the case of hypothesis test, in addition to manually setting up
> randomized null hypothesis distributions and graphing them, I'm using
> the infer library. I've been really impressed with this library and
> enjoy solving this type of problem with it.
> 
> One of the first steps in solving a hypothesis test with infer is to
> set up the initial sampling dataset. Often, in Lock5 problems, this is
> a dataset that can be downloaded with library(Lock5Data). However,
> other problems are worded like this:
> 
> ===========================
> In 1980 and again in 2010, a Gallup poll asked a random sample of 1000
> US citizens “Are you in favor of the death penalty for a person
> convicted of murder?” In 1980, the proportion saying yes was 0.66. In
> 2010, it was 0.64. Does this data provide evidence that the proportion
> of US citizens favoring the death penalty was higher in 1980 than it
> was in 2010? Use p1 for the proportion in 1980 and p2 for the
> proportion in 2010.
> ============================
> 
> I've been setting up problems like this with code similar to:
> ===========================
> df <- data.frame(
>      survey = c(rep("1980", 1000), rep("2010", 1000)),
>      DP = c(rep("Y", 0.66*1000), rep("N", 1000 - (0.66*1000)),
>             rep("Y", 0.64*1000), rep("N", 1000 - (0.64*1000))))
> 
> (d_hat <- df %>%
>       specify(response = DP, explanatory = survey, success = "Y") %>%
>       calculate(stat = "diff in props", order = c("1980", "2010")))
> ============================
> 
> My question is, is this the way I should be setting up datasets for
> problems of this type? Is there a more efficient way, that doesn't
> require the construction of the whole sample dataset?
> 
> It seems like I should be able to do something like this:
> =================
> (df <- data.frame(group1count = 660, #Or, group1prop = 0.66
>                   group1samplesize = 1000,
>                   group2count = 640, #Or, group2prop = 0.64
>                   group2samplesize = 1000))
> =================
> 
> Am I overlooking a way to set up these sample dataframes for infer?
> 
> Thanks for your advice and guidance.
> 
> -Kevin
> 
> 
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,

base R is perfectly capable of solving the problem.
Something like this.

year <- c(1980, 2010)
p <- c(0.66, 0.64)
n <- c(1000, 1000)
df1 <- data.frame(year, p, n)
df1$yes <- with(df1, p*n)
df1$no <- with(df1, n - yes)

mat <- as.matrix(df1[c("yes", "no")])

prop.test(mat)
#>
#>  2-sample test for equality of proportions with continuity correction
#>
#> data:  mat
#> X-squared = 0.79341, df = 1, p-value = 0.3731
#> alternative hypothesis: two.sided
#> 95 percent confidence interval:
#>  -0.02279827  0.06279827
#> sample estimates:
#> prop 1 prop 2
#>   0.66   0.64

chisq.test(mat)
#>
#>  Pearson's Chi-squared test with Yates' continuity correction
#>
#> data:  mat
#> X-squared = 0.79341, df = 1, p-value = 0.3731

Hope this helps,

Rui Barradas

-- 
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com