[R] randomForest out of bag prediction
Michael Mayer
m@yermich@el79 @ending from gm@il@com
Sat Jan 12 19:16:18 CET 2019
predict(diachp.rf, dataX) returns the in-sample predictions, not the OOB predictions. The response variable «quality» is only used during model fit, not during prediction.
Since in-sample predictions of random forests are typically grossly overfitted by construction, extremely high accuracies are not unexpected.
Gesendet von Mail für Windows 10
Von: Witold E Wolski
Gesendet: Samstag, 12. Januar 2019 18:56
An: r-help using r-project.org
Betreff: [R] randomForest out of bag prediction
Hello,
I am just not sure what the predict.RandomForest function is doing...
I confused.
I would expect the predictions for these 2 function calls to predict the same:
```{r}
diachp.rf <- randomForest(quality~.,data=data,ntree=50, importance=TRUE)
ypred_oob <- predict(diachp.rf)
dataX <- data %>% select(-quality) # remove response.
ypred <- predict( diachp.rf, dataX )
ypred_oob == ypred
```
These are both out of bag predictions but ypred and ypred_oob are
actually they are very different.
> table(ypred_oob , data$quality)
ypred_oob 0 1
0 1324 346
1 493 2837
> table(ypred , data$quality)
ypred 0 1
0 1817 0
1 0 3183
What I find even more disturbing is that 100% accuracy for ypred.
Would you agree that this is rather unexpected?
regards
Witek
--
Witold Eryk Wolski
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help
mailing list