[R] Strange behavior when sampling rows of a data frame
William Dunlap
wdun|@p @end|ng |rom t|bco@com
Fri Jun 19 18:20:43 CEST 2020
The first subscript argument is getting evaluated twice.
> trace(sample)
> set.seed(2020); df[i<-sample(10,3), ]$Treated <- TRUE
trace: sample(10, 3)
trace: sample(10, 3)
> i
[1] 1 10 4
> set.seed(2020); sample(10,3)
trace: sample(10, 3)
[1] 7 6 8
> sample(10,3)
trace: sample(10, 3)
[1] 1 10 4
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Fri, Jun 19, 2020 at 8:46 AM Rui Barradas <ruipbarradas using sapo.pt> wrote:
> Hello,
>
> I don't have an answer on the reason why this happens but it seems like
> a bug. Where?
>
> In which of `[<-.data.frame` or `[<-.default`?
>
> A solution is to subset and assign the vector:
>
>
> set.seed(2020)
> df2 <- data.frame(unit = 1:10)
> df2$treated <- FALSE
>
> df2$treated[sample(nrow(df2), 3)] <- TRUE
> df2
> # unit treated
> #1 1 FALSE
> #2 2 FALSE
> #3 3 FALSE
> #4 4 FALSE
> #5 5 FALSE
> #6 6 TRUE
> #7 7 TRUE
> #8 8 TRUE
> #9 9 FALSE
> #10 10 FALSE
>
>
> Or
>
>
> set.seed(2020)
> df3 <- data.frame(unit = 1:10)
> df3$treated <- FALSE
>
> df3[sample(nrow(df3), 3), "treated"] <- TRUE
> df3
> # result as expected
>
>
> Hope this helps,
>
> Rui Barradas
>
>
>
> Às 13:49 de 19/06/2020, Sébastien Lahaie escreveu:
> > I ran into some strange behavior in R when trying to assign a treatment
> to
> > rows in a data frame. I'm wondering whether any R experts can explain
> > what's going on.
> >
> > First, let's assign a treatment to 3 out of 10 rows as follows.
> >
> >> df <- data.frame(unit = 1:10)
> >> df$treated <- FALSE
> >> s <- sample(nrow(df), 3)
> >> df[s,]$treated <- TRUE
> >> df
> > unit treated
> >
> > 1 1 FALSE
> >
> > 2 2 TRUE
> >
> > 3 3 FALSE
> >
> > 4 4 FALSE
> >
> > 5 5 TRUE
> >
> > 6 6 FALSE
> >
> > 7 7 TRUE
> >
> > 8 8 FALSE
> >
> > 9 9 FALSE
> >
> > 10 10 FALSE
> >
> > This is as expected. Now we'll just skip the intermediate step of saving
> > the sampled indices, and apply the treatment directly as follows.
> >
> >> df <- data.frame(unit = 1:10)
> >> df$treated <- FALSE
> >> df[sample(nrow(df), 3),]$treated <- TRUE
> >> df
> > unit treated
> >
> > 1 6 TRUE
> >
> > 2 2 FALSE
> >
> > 3 3 FALSE
> >
> > 4 9 TRUE
> >
> > 5 5 FALSE
> >
> > 6 6 FALSE
> >
> > 7 7 FALSE
> >
> > 8 5 TRUE
> >
> > 9 9 FALSE
> >
> > 10 10 FALSE
> >
> > Now the data frame still has 10 rows with 3 assigned to the treatment.
> But
> > the units are garbled. Units 1 and 4 have disappeared, for instance, and
> > there are duplicates for 6 and 9, one assigned to treatment and the other
> > to control. Why would this happen?
> >
> > Thanks,
> > Sebastien
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Este e-mail foi verificado em termos de vírus pelo software antivírus
> Avast.
> https://www.avast.com/antivirus
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list