[R] simple randomization question: How to perform "sample" in chunks

Don MacQueen macq at llnl.gov
Thu Aug 20 18:58:06 CEST 2009

I believe this will do what you want:

   tmp1 <- split(xx, xx$a)
   do.call(rbind, tmp1[ sample(length(unique(xx$a))) ])

The idea is to split the dataframe, and then reassemble in a random order.

Whether or not it will be faster for a large dataframe, I don't know.

There's probably also an indexing solution, perhaps using rle(), but 
I thought of this first...


At 6:22 PM +0300 8/20/09, Tal Galili wrote:
>Hello dear R-help group.
>My task looks simple, but I can't seem to find a "smart" (e.g: non loop)
>solution to it.
>Task: I wish to randomize a data.frame by one column, while keeping the
>inner-order in the second column as is.
>So for example, let's say I have the following data.frame:
>xx <-data.frame(a=  c(1,2,2,3,3,3,4,4,4,4) ,
>                         b =  c(1,1,2,1,2,3,1,2,3,4) )
>I would like to shuffle it by column "a", while keeping the order in column
>Here is my "not-smart" way of doing it:
># R example
>xx <-data.frame(a=  c(1,2,2,3,3,3,4,4,4,4) ,
>                         b =  c(1,1,2,1,2,3,1,2,3,4) )
>randomize.by.column.a <- function(xx)
>new.a.order <- sample(unique(xx$a))
>new.xx <- NULL
>for(i in new.a.order)
>   xx.subset <- xx[ xx$a %in% i ,]
>   new.xx <- rbind(new.xx ,  xx.subset)
># END of - R example
>I would love for a better, faster, way of doing it.
>My contact information:
>Tal Galili
>Phone number: 972-50-3373767
>FaceBook: Tal Galili
>My Blogs:
>	[[alternative HTML version deleted]]
>R-help at r-project.org mailing list
>PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA

More information about the R-help mailing list