[R] sample equal number of cases per class
ollestrat
stratman1 at gmx.de
Sun Nov 4 11:47:07 CET 2012
Dear community
I have a dataframe and want to split it into a learn and a test partition.
However the learnset should be balanced, i.e. each class should have the
same number of cases. I tried and searched a lot, without success so far.
Maybe you can help?
Some example code
*# generate example data
df <- data.frame(class = as.factor(sample(1:3, 20, replace = T)), var1 =
rnorm(20,3), var2 = rnorm(20,6))
summary(df)
# split into learn and test sets using the caret package
require(caret)
ind <- createDataPartition(df$class, p=.8, list = F, times = 1)
# The problem is here: class sizes are not equal)
learnset <- df[ind,]
summary(learnset)*
Version info:
/> R.Version()
$platform
[1] "x86_64-pc-mingw32"
$arch
[1] "x86_64"
$os
[1] "mingw32"
$system
[1] "x86_64, mingw32"
$major
[1] "2"
$minor
[1] "15.1"/
--
View this message in context: http://r.789695.n4.nabble.com/sample-equal-number-of-cases-per-class-tp4648381.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list