[R] What don't I understand about sample()?

Kevin Zembower kev|n @end|ng |rom zembower@org
Sat Mar 15 18:28:42 CET 2025


Hi, Richard, thanks for replying. I should have mentioned the third
edition, which we're using. The data file didn't change between the
second and third editions, and the data on Body Mass Gain was the same
as in the first edition, although the first edition data file contained
additional variables.

According to my text, the BMGain was measured in grams. Thanks for
pointing out that my statement of the problem lacked crucial
information.

The matrix in my example comes from an example in
https://pages.stat.wisc.edu/~larget/stat302/chap3.pdf, where the author
created a bootstrap example with a matrix that consisted of one row for
every sample in the bootstrap, and one column for each mean in the
original data. This allowed him to find the mean for each row to create
the bootstrap statistics.

The only need for the tidyverse is to use the read_csv() function. I'm
regrettably lazy in not determining which of the multiple functions in
the tidyverse library loads read_csv(), and just using that one.

Thanks, again, for helping me to further understand R and this problem.

-Kevin

On Sat, 2025-03-15 at 12:00 +0100, r-help-request using r-project.org wrote:
> Not having the book (and which of the three editions are you using?),
> I downloaded the data and played with it for a bit.
> dotchart() showed the Dark and Light conditions looked quite
> different, but also showed that there are not very many cases.
> After trying t.test, it occurred to me that I did not know whether
> "BMGain" means gain in *grams* or gain in *percent*.
> Reflection told me that for a growth experiment, percent made more
> sense, which reminded my of one of my first
> student advising experiences, where I said "never give the computer
> percentages; let IT calculate the percentages
> from the baseline and outcome, because once you've thrown away
> information, the computer can't magically get it back."
> In particular, in the real world I'd be worried about the possibility
> that there was some confounding going on, so I would
> much rather have initial weight and final weight as variables.
> If BMGain is an absolute measure, the p value for a t test is teeny
> tiny.
> If BMGain is a percentage, the p value for a sensible t test is about
> 0.03.
> 
> A permutation test went like this.
> is.light <- d$Group == "Light"
> is.dark <- d$Group == "Dark"
> score <- function (g) mean(g[is.light]) - mean(g[is.dark])
> base.score <- score(d$BMGain)
> perm.scores <- sapply(1:997, function (i) score(sample(d$BMGain)))
> sum(perm.scores >= base.score) / length(perm.scores)
> 
> I don't actually see where matrix() comes into it, still less
> anything
> in the tidyverse.
>



More information about the R-help mailing list