[R] create stratified splits
Ista Zahn
istazahn at gmail.com
Wed Dec 19 23:45:41 CET 2012
Hi Martin,
Interesting question. This is not efficient, but I thought I would
post a brute force method that might be good enough. Surely someone
will have a better approach... Well we'll see. Here is a dumb,
inefficient (but workable) way:
# create the vector to be split
r <- runif(100)
# write a function to split it, with various knobs and toggles
splitSimilar <- function(x, n, mean.tol=.1, sd.tol=.1, itr=500, verbose=FALSE) {
M <- mean.tol+1
SD <- sd.tol+1
I <- 0
# as long as the sd of the means and standard deviations are greater
than tolerance...
while((M > mean.tol | SD > sd.tol) & I <= itr) {
I <- I + 1
## pick another split
x1 <- data.frame(g = rep(letters[1:n], length(x)/n),
value = sample(x, length(x)))
M <- sd(tapply(x1$value, x1$g, FUN=mean))
SD <- sd(tapply(x1$value, x1$g, FUN=sd))
if(verbose) {
cat("M = ", M, ", mean.tol =", mean.tol, ": SD = ", SD, ",
sd.tol=", sd.tol, "\n")
}
}
# don't try forever...
if(I >= itr) {
stop("failed to find split matching criteria: try increasing tolerance")
} else {
return(x1)
}
}
# now use our function to find a set of splits within our mean and sd
tolerance.
tst <- splitSimilar(r, 10, mean.tol = 0.05, sd.tol = 0.1)
# adjust some of the dials and switches to suit...
tst <- splitSimilar(r, 10, mean.tol = 0.03, sd.tol = 0.05, itr=5000)
Best,
Ista
On Wed, Dec 19, 2012 at 3:23 PM, Martin Batholdy
<batholdy at googlemail.com> wrote:
> Hi,
>
>
> I have a vector like:
>
> r <- runif(100)
>
> Now I would like to split r into 10 pieces (each with 10 elements) –
> but the 'pieces' should be roughly similar with regard to mean and sd.
>
> what is an efficient way to do this in R?
>
>
> thanks!
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list