Terry Therneau
therneau at mayo.edu
Wed Dec 16 14:39:27 CET 2009
Hi,
I am trying to write my own split function for rpart. The aim is to do,
instead of anova, a linear regression to determine the split (minimize
some criterion like sum of rss left and right of the split). The
regression (lm) should simply use the dependent and independent
variables passed to rpart.
I am aware of the example provided in the rpart source code, but
stumbled on similar problems that I saw reported on this list (no final
solution posted, as far as I could see). The problem is, broadly
speaking, that I do not see a way to access the full set of x and y
variables in the user-written split-function.
The rpart routine provides the x variables to a user-written split
function one at a time. Since the entire structure of rpart --
printing, plotting, tree representation, etc --- is based on the premise
of a single variable driving each split, what you are asking for would
require an entirely different program.
Terry Therneau
