[R] bug in rpart?
Uwe Ligges
ligges at statistik.tu-dortmund.de
Fri May 22 19:43:57 CEST 2009
Yuanyuan wrote:
> Greetings,
>
> I checked the Indian diabetes data again and get one tree for the data with
> reordered columns and another tree for the original data. I compared these
> two trees, the split points for these two trees are exactly the same but the
> fitted classes are not the same for some cases. And the misclassification
> errors are different too. I know how CART deal with ties --- even we are
> using the same data, the subjects to the left and right would not be the
> same if we just rearrange the order of covariates.
>
> But the problem is, the fitted trees are exactly the same on the split
> points. Shouldn't we get the same fitted values if the decisions are the
> same at each step? Why the same structured trees have different observations
> on the nodes?
Because they may use different surrogate variables. Note that your data
contain missing values that are handled by surrogates.
Best,
Uwe Ligges
> The source code for running the diabetes data example and the output of
> trees are attached. Your professional opinion is very much appreciated.
>
> library(mlbench)
> data(PimaIndiansDiabetes2)
> mydata<-PimaIndiansDiabetes2
> library(rpart)
> fit2<-rpart(diabetes~., data=mydata,method="class")
> plot(fit2,uniform=T,main="CART for original data")
> text(fit2,use.n=T,cex=0.6)
> printcp(fit2)
> table(predict(fit2,type="class"),mydata$diabetes)
> ## misclassifcation table: rows are fitted class
> neg pos
> neg 437 68
> pos 63 200
>
>
> pmydata<-data.frame(mydata[,c(1,6,3,4,5,2,7,8,9)])
> fit3<-rpart(diabetes~., data=pmydata,method="class")
> plot(fit3,uniform=T,main="CART after exchaging mass & glucose")
> text(fit3,use.n=T,cex=0.6)
> printcp(fit3)
> table(predict(fit3,type="class"),pmydata$diabetes)
> ##after exchage the order of BODY mass and PLASMA glucose
> neg pos
> neg 436 64
> pos 64 204
>
>
> Best,
>
>
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list