[R] Setting up hypothesis tests with the infer library?
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Sat Mar 29 17:42:59 CET 2025
Às 16:09 de 29/03/2025, Kevin Zembower via R-help escreveu:
> Hello, all,
>
> We're now starting to cover hypothesis tests in my Stats 101 course. As
> usual in courses using the Lock5 textbook, 3rd ed., the homework
> answers are calculated using their StatKey application. In addition
> (and for no extra credit), I'm trying to solve the problems using R. In
> the case of hypothesis test, in addition to manually setting up
> randomized null hypothesis distributions and graphing them, I'm using
> the infer library. I've been really impressed with this library and
> enjoy solving this type of problem with it.
>
> One of the first steps in solving a hypothesis test with infer is to
> set up the initial sampling dataset. Often, in Lock5 problems, this is
> a dataset that can be downloaded with library(Lock5Data). However,
> other problems are worded like this:
>
> ===========================
> In 1980 and again in 2010, a Gallup poll asked a random sample of 1000
> US citizens “Are you in favor of the death penalty for a person
> convicted of murder?” In 1980, the proportion saying yes was 0.66. In
> 2010, it was 0.64. Does this data provide evidence that the proportion
> of US citizens favoring the death penalty was higher in 1980 than it
> was in 2010? Use p1 for the proportion in 1980 and p2 for the
> proportion in 2010.
> ============================
>
> I've been setting up problems like this with code similar to:
> ===========================
> df <- data.frame(
> survey = c(rep("1980", 1000), rep("2010", 1000)),
> DP = c(rep("Y", 0.66*1000), rep("N", 1000 - (0.66*1000)),
> rep("Y", 0.64*1000), rep("N", 1000 - (0.64*1000))))
>
> (d_hat <- df %>%
> specify(response = DP, explanatory = survey, success = "Y") %>%
> calculate(stat = "diff in props", order = c("1980", "2010")))
> ============================
>
> My question is, is this the way I should be setting up datasets for
> problems of this type? Is there a more efficient way, that doesn't
> require the construction of the whole sample dataset?
>
> It seems like I should be able to do something like this:
> =================
> (df <- data.frame(group1count = 660, #Or, group1prop = 0.66
> group1samplesize = 1000,
> group2count = 640, #Or, group2prop = 0.64
> group2samplesize = 1000))
> =================
>
> Am I overlooking a way to set up these sample dataframes for infer?
>
> Thanks for your advice and guidance.
>
> -Kevin
>
>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,
base R is perfectly capable of solving the problem.
Something like this.
year <- c(1980, 2010)
p <- c(0.66, 0.64)
n <- c(1000, 1000)
df1 <- data.frame(year, p, n)
df1$yes <- with(df1, p*n)
df1$no <- with(df1, n - yes)
mat <- as.matrix(df1[c("yes", "no")])
prop.test(mat)
#>
#> 2-sample test for equality of proportions with continuity correction
#>
#> data: mat
#> X-squared = 0.79341, df = 1, p-value = 0.3731
#> alternative hypothesis: two.sided
#> 95 percent confidence interval:
#> -0.02279827 0.06279827
#> sample estimates:
#> prop 1 prop 2
#> 0.66 0.64
chisq.test(mat)
#>
#> Pearson's Chi-squared test with Yates' continuity correction
#>
#> data: mat
#> X-squared = 0.79341, df = 1, p-value = 0.3731
Hope this helps,
Rui Barradas
--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com
More information about the R-help
mailing list