[R] Why use numFold in evaluate_Weka_classifier of RWeka
Hans W Borchers
hwborchers at googlemail.com
Tue Aug 10 09:00:05 CEST 2010
s0300851 <s0300851 <at> tp.edu.tw> writes:
>
> Hi everyone,
>
> I have a question about using RWeka package,
> we know that instruction make_Weka_classifier that can help
> us to build a model,and evaluate_Weka_classifier instruction
> can help us to evaluate the performance of the model using on new data.
> But I have a question about how to using the parameter numFold in
> evaluate_Weka_classifier.Cross-validation means that using some parts
> to train our data,and some parts to do test,but it should be using in
> the step of building our model not evaluation.
> I try to think about the numFold=n in the evaluate_Weka_classifier may be this:
> randomly(but in proportion) to select data in the dataset then redo n times,
> then to evaluate the performance.Is this correct?
No. It's preferable to learn about Weka right from the Weka manual.
About the number of folds ('numFold') it says:
"A more elaborate method is cross-validation. Here, a number of
folds n is specified. The dataset is randomly reordered and then
split into n folds of equal size. In each iteration, one fold is
used for testing and the other n-1 folds are used for training the
classifier. The test results are collected and averaged over all
folds. This gives the cross-validation estimate of the accuracy."
> Thanks.
> Best regards ,
>
> Hsiao
More information about the R-help
mailing list