[R] Sample rows in data frame by subsets

Chris Stubben stubben at lanl.gov
Mon Jan 23 21:04:06 CET 2006


Hi,

I need to resample rows in a data frame by subsets

L3 <- LETTERS[1:3]
d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, repl=TRUE))
    x  y fac
1  1  1   A
2  1  2   A
3  1  3   A
4  1  4   A
5  1  5   C
6  1  6   C
7  1  7   B
8  1  8   A
9  1  9   C
10 1 10   A

I have seen this used to sample rows with replacement

d[sample(nrow(d), replace=T), ]

     x  y fac
7   1  7   B
2   1  2   A
1   1  1   A
3   1  3   A
2.1 1  2   A
10  1 10   A
8   1  8   A
9   1  9   C
1.1 1  1   A
8.1 1  8   A


but I would like to sample based on the original number in fac

summary(d$fac)
A B C
6 1 3


rbind(subset(d, fac=="A")[sample(6, replace=T), ],
       subset(d, fac=="B")[sample(1, replace=T), ],
       subset(d, fac=="C")[sample(3, replace=T), ] )

     x  y fac
2   1  2   A
3   1  3   A
3.1 1  3   A
1   1  1   A
10  1 10   A
1.1 1  1   A
7   1  7   B
5   1  5   C
6   1  6   C
5.1 1  5   C


Is there an easy way to do this in one step or with a short function?  I 
have lots of dataframes to resample.

Thanks,

Chris


-- 
-----------------
Chris Stubben

Los Alamos National Lab
BioScience Division
MS M888
Los Alamos, NM 87545




More information about the R-help mailing list