[R] help with RPART

Terry Therneau therneau at mayo.edu
Mon Jun 2 17:30:59 CEST 2008


  When using anova method, all of the printed results are scaled by the RSS for 
the top node.  Therefore the relative error measures for the trees already are 
1-R^2.
  
    tfit <- rpart(time ~ ., lung)
    summary(tfit)

          CP nsplit rel error   xerror      xstd 
1 0.03665178      0 1.0000000 1.010097 0.1136942
2 0.03310179      1 0.9633482 1.079216 0.1172675
3 0.03029365      2 0.9302464 1.109587 0.1173583
4 0.01963453      3 0.8999528 1.249586 0.1327888
5 0.01627146     11 0.7396726 1.238411 0.1310952
6 0.01507635     12 0.7234012 1.260919 0.1337384
7 0.01031566     13 0.7083248 1.282740 0.1399397
8 0.01000000     14 0.6980091 1.296213 0.1396711

Node number 1: 228 observations,    complexity param=0.03665178
  mean=305.2325, MSE=44176.93 
  left son=2 (81 obs) right son=3 (147 obs)
  Primary splits:
      pat.karno < 75    to the left,  improve=0.03661157, (3 missing)
      ph.ecog   < 1.5   to the right, improve=0.03620793, (1 missing)
      status    < 1.5   to the right, improve=0.02930372, (0 missing)
      ph.karno  < 85    to the left,  improve=0.02058114, (1 missing)
      sex       < 1.5   to the left,  improve=0.01679999, (0 missing)
  Surrogate splits:
      ph.ecog  < 1.5   to the right, agree=0.787, adj=0.392, (3 split)
      ph.karno < 75    to the left,  agree=0.751, adj=0.291, (0 split)
      age      < 72.5  to the right, agree=0.680, adj=0.089, (0 split)

Node number 2: 81 observations,    complexity param=0.03310179
  mean=251.0247, MSE=34100.99 
  left son=4 (59 obs) right son=5 (22 obs)
  Primary splits:
      wt.loss < 21    to the left,  improve=0.12735970, (7 missing)
      status  < 1.5   to the right, improve=0.08060663, (0 missing)
      age     < 68.5  to the right, improve=0.04906869, (0 missing)
      inst    < 2.5   to the left,  improve=0.04148716, (0 missing)
      sex     < 1.5   to the left,  improve=0.02401074, (0 missing)
  Surrogate splits:
      ph.karno < 55    to the right, agree=0.743, adj=0.095, (6 split)

etc,

  The first split has R^2 = .0367 = 1-overall fit (top few lines) = the 
improvement measure for the node.
   
   The second split has R^2 = .127 for the obs within that node, it improve the 
R^2 for the model as a whole by .033.
   
   	Terry T.



More information about the R-help mailing list