weidong zhang mzhang1208 at hotmail.com
Tue Sep 30 00:00:03 CEST 2003

Hi All,

I have some questions on using library rpart. Given my data below, the 
plotcp gives me increasing 'xerrors' across different cp's with huge xstd 
(plot attached). What causes the problem or it's not a problem at all? I am 
thinking 'xerror's should be decreasing when 'cp' gets smaller. Also what 
the 'xstd' really tells us? If the error bars for each xerror overlap for 
different cp's, does that mean we don't have significant improvement for 
misclassification rate when we split the tree?

My data have are two classes with 138 observations and 129 attributes. Here 
is what I did:
[1] 138 130
>man.dt1 <- rpart(Target~.,data=man.dat[,c(1,8:136)], 
>method='class',cp=1e-5, parms=list(split='information'))



Classification tree:
rpart(formula = Target ~ ., data = man.dat[, c(1, 8:136)], method = "class",
    parms = list(split = "information"), cp = 1e-05)

Variables actually used in tree construction:
[1] CHX.V  CYN.Cu SPF.Bi

Root node error: 25/138 = 0.18116

n= 138

       CP nsplit rel error xerror    xstd
1 0.18667      0      1.00   1.00 0.18098
2 0.00001      3      0.44   1.12 0.18897

I would appreciate your help on this,


