[R] questions on rpart (tree changes when rearrange the order of covariates)

Terry Therneau therneau at mayo.edu
Wed May 13 14:26:33 CEST 2009


 If two variables have exactly the same split importance, then rpart will use 
the one that was first in the model statement.  So if
 	rpart(group ~ age + height + weight + sex)
and at some split point both age and weight gave a split with 20 correct and 9 
incorrect, then age would be used to split at that node.

  Even though the error of the age and weight splits are the same, the set of 9 
subjects that were incorrect may be different, i.e., they don't send exactly the 
same observations to the left and the right.  Thus, the rest of the tree from 
that point on may be different, giving a different fit.
  
  For continuous y this rarely happens -- that two splits have exactly the same 
R^2 -- but it is not uncommon in classification problems.  
  
  	Terry Therneau




More information about the R-help mailing list