[R] What don't I understand about sample()?

Kevin Zembower kev|n @end|ng |rom zembower@org
Fri Mar 14 19:51:55 CET 2025


Thank you all, very much, for your kind and detailed explanations. I
didn't understand, mainly, that the matrix() call only called its
parameters once. I was certain that this was a bug with sample()
getting seeded with a constant value, and giving the same permutation.

I think I need to make my MWE a little less minimal to continue
learning. If you're familiar with the Lock5 statistics textbook, I'm
working on the Light and Dark mice example, where groups of mice were
exposed or not to light at night, then measured for weight gain. The
statistic is mean difference in weight gain between the two groups.

My understanding of how I'm supposed to construct a randomized
distribution is to join the weight gains of the 10 mice exposed to
light at night to the 8 mice not exposed to light at night. After
shuffling this data, I arbitrarily group the first 10 values into the
'light' group, and the last 8 into the 'dark' group, and find the
difference in their means.

I think I can do this correctly with:
===================
## Less-minimal working example
library(tidyverse)

library(Lock5Data)
data(LightatNight)
str(LightatNight)

## Or, if you don't have the Lock5Data library:
(d <-
read_csv("https://www.lock5stat.com/datasets3e/LigthtatNight.csv"))

(lt <- d$BMGain[d$Group == "Light"])
(dk <- d$BMGain[d$Group == "Dark"])
(n_lt <- length(lt))
(n_dk <- length(dk))

(data <- c(lt, dk))

B <- 10 #Will be 1000
n <- length(data)

random.samples <- matrix(NA, B, n)
random.statistics <- rep(NA, B)

for(i in 1:B) {
    random.samples[i,] <- sample(data)
    random.statistics[i] <- mean(random.samples[i, 1:n_lt]) -
        mean(random.samples[i, (n_lt + 1):(n_lt + n_dk)])
}
random.samples
random.statistics

## Trying to do it without a for(), using Peter's suggestion:
(random.samples <- matrix(replicate(B, sample(data)), B, n,
byrow=TRUE))
compute.diff.means <- function(x) {
    return(mean(x[1:n_lt]) - mean(x[(n_lt+1):(n_lt+n_dk)]))
}
(random.statistics <- apply(random.samples, 1, compute.diff.means))
=======================

I think both of these methods give me the data I'm trying for. Any
suggestions on my R coding techniques are welcome.

Thank you all, again, for taking the time and effort to help me. Your
help is greatly appreciated.

-Kevin

On Thu, 2025-03-13 at 17:00 -0400, Kevin Zembower wrote:
> Hello, all,
> 
> I'm learning to do randomized distributions in my Stats 101 class*. I
> thought I could do it with a call to sample() inside a matrix(),
> like:
> 
> > matrix(sample(1:10, replace=TRUE), 5, 10, byrow=TRUE)
>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> [1,]    8    2    3    1    8    2    8    8    9     8
> [2,]    8    2    3    1    8    2    8    8    9     8
> [3,]    8    2    3    1    8    2    8    8    9     8
> [4,]    8    2    3    1    8    2    8    8    9     8
> [5,]    8    2    3    1    8    2    8    8    9     8
> > 
> 
> Imagine my surprise to learn that all the rows were the same
> permutation. I thought each time sample() was called inside the
> matrix,
> it would generate a different permutation. 
> 
> I modeled this after the bootstrap sample techniques in
> https://pages.stat.wisc.edu/~larget/stat302/chap3.pdf. I don't
> understand why it works in bootstrap samples (with replace=TRUE), but
> not in randomized distributions (with replace=FALSE).
> 
> Thanks for any insight you can share with me, and any suggestions for
> getting rows in a matrix with different permutations.
> 
> -Kevin
> 
> *No, this isn't a homework problem. We're using Lock5 as the text in
> class, along with its StatKey web application. I'm just trying to get
> more out of the class by also solving our problems using R, for which
> I'm not receiving any class credit.





More information about the R-help mailing list