[R] Why use numFold in evaluate_Weka_classifier of RWeka

Tue Aug 10 09:00:05 CEST 2010

s0300851 <s0300851 <at> tp.edu.tw> writes:

> 
> Hi everyone,
> 
> I have a question about using RWeka package，
> we know that instruction make_Weka_classifier that can help 
> us to build a model,and evaluate_Weka_classifier instruction
> can help us to evaluate the performance of  the model using on new data.
> But I have a question about how to using the parameter numFold in
> evaluate_Weka_classifier.Cross-validation means that using some parts 
> to train our data,and some parts to do test,but it should be using in 
> the step of building our model not evaluation.
> I try to think about the numFold=n in the evaluate_Weka_classifier may be this:
> randomly(but in proportion) to select data in the dataset then redo n times,
> then to evaluate the performance.Is this correct?

No. It's preferable to learn about Weka right from the Weka manual.
About the number of folds ('numFold') it says:

    "A more elaborate method is cross-validation. Here, a number of
    folds n is specified. The dataset is randomly reordered and then
    split into n folds of equal size. In each iteration, one fold is
    used for testing and the other n-1 folds are used for training the
    classifier. The test results are collected and averaged over all
    folds. This gives the cross-validation estimate of the accuracy."

> Thanks.
> Best regards ,
> 
> Hsiao