[R] recursive partitioning in R
Therneau, Terry M., Ph.D.
therneau at mayo.edu
Thu Nov 12 13:44:37 CET 2015
Look at the rpart vignette "User written split functions". The code allows you to add
your own splitting method to the code (in R, no C required). This has proven to be very
useful for trying out new ideas.
The second piece would be to do your own cross-validation. That is, turn off the built in
cross-validation using the xval=0 option, then explicitly do the cross-validation
yourself. Fit a new tree to some chosen subset of data, using your split rule of course,
and then use predict() to get predicted values for the remaining observations. Again, this
is all in R, and you can explicitly control your in or out of bag subsets.
The xpred.rpart function may be useful to automate some of the steps.
If you look up rpart on CRAN, you will see a link to the package source. If you were to
read the C source code you will discover that 95% is boring bookkeeping of what
observations are in what part(s) of the tree, sorting the data, tracking missing values,
etc. If you ever do want to write your own code you are more than welcome to build off
this --- I wouldn't want to write that part again.
Terry Therneau
On 11/12/2015 05:00 AM, r-help-request at r-project.org wrote:
> Dear List,
>
> I'd like to make a few modifications to the typical CART algorithm, and
> I'd rather not code the whole thing from scratch. Specifically I want
> to use different in-sample and out-of-sample fit criteria in the split
> choosing and cross-validation stages.
>
> I see however that the code for CART in both the rpart and the tree
> packages is written in C.
>
> Two questions:
>
> * Where is the C code? It might be possible to get a C-fluent
> programmer to help me with this.
> * Is there any code for CART that is written entirely in R?
>
> Thanks,
> Andrew
More information about the R-help
mailing list