[R] NA and NaN randomForest

Liaw, Andy andy_liaw at merck.com
Wed Apr 25 15:58:17 CEST 2007

Hi Clayton,

If you use the formula interface, then it should do what you want:

R> library(randomForest)
randomForest 4.5-18 
Type rfNews() to see new features/changes/bug fixes.
R> iris1 <- iris[-(1:5),]
R> iris2 <- iris[1:5,]
R> iris2[1, 3] <- NA
R> iris2[3, 1] <- NA
R> iris.rf <- randomForest(Species ~ ., iris1)
R> predict(iris.rf, iris2[-5])
[1] <NA>   setosa <NA>   setosa setosa
Levels: setosa versicolor virginica

The problem, of course, is that the formula interface is not suitable
for data with large number of variables.  I'll look into doing the same
thing in the default method.


From: clayton.springer at novartis.com
> Dear R-help,
> This is about randomForest's handling of NA and NaNs in test set data.
> Currently, if the test set data contains an NA or NaN then 
> predict.randomForest will skip that row in the output.
> I would like to change that behavior to outputting an NA.
> Can this be done with flags to randomForest?
> If not can some sort of wrapper be built to put the NAs back in?
> thanks,
> Clayton
> _________________________
> CONFIDENTIALITY NOTICE\ \ The information contained in this 
> ...{{dropped}}
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Notice:  This e-mail message, together with any attachments,...{{dropped}}

More information about the R-help mailing list