[R] difference between createPartition and createfold functions

Steve Lianoglou mailinglist.honeypot at gmail.com
Sun Oct 2 21:21:26 CEST 2011


Hi,

On Sun, Oct 2, 2011 at 2:47 PM,  <bby2103 at columbia.edu> wrote:
> Hello,
>
> I'm trying to separate my dataset into 4 parts with the 4th one as the test
> dataset, and the other three to fit a model.
>
> I've been searching for the difference between these 2 functions in Caret
> package, but the most I can get is this--
>
> A series of test/training partitions are created using createDataPartition
> while createResample creates one or more bootstrap samples. createFolds
> splits the data into k groups.
>
> I'm missing something here? What is the difference btw createPartition and
> createFold? I guess they wouldn't be equivalent.

Well -- you could always look at the source code to find out (enter
the name of the function into your R console and hit return), but you
can also do some experimentation to find out. Using the data from the
Examples section of caret::createFolds:

R> library(caret)
R> data(oil)
R> part <- createDataPartition(oilType, 2)
R> fold <- createFolds(oilType, 2)

R> length(Reduce(intersect, part))
[1] 27

R> length(Reduce(intersect, fold))
[1] 0

Looks like `createDataPartition` split your data into smaller pieces,
but allows for the same example to appear in different splits.

`createFolds` doesn't allow different examples to appear in different
splits of the folds.

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list