[R] randomForest: proximity for new objects using an existing rf
Liaw, Andy
andy_liaw at merck.com
Wed Feb 1 16:39:38 CET 2012
There's an alternative, but it may not be any more efficient in time or memory...
You can run predict() on the training set once, setting nodes=TRUE. That will give you a n by ntree matrix of which node of which tree the data point falls in. For any new data, you would run predict() with nodes=TRUE, then compute the proximity "by hand" by counting how often any given pair landed in the same terminal node of each tree.
Andy
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Kilian
> Sent: Wednesday, February 01, 2012 5:39 AM
> To: r-help at r-project.org
> Subject: [R] randomForest: proximity for new objects using an
> existing rf
>
> Dear all,
>
> using an existing random forest, I would like to calculate
> the proximity
> for a new test object, i.e. the similarity between the new
> object and the
> old training objects which were used for building the random
> forest. I do
> not want to build a new random forest based on both old and
> new objects.
>
> Currently, my workaround is to calculate the proximites of a
> combined data
> set consisting of training and new objects like this:
>
> model <- randomForest(Xtrain, Ytrain) # build random forest
> nnew <- nrow(Xnew) # number of new objects
> Xcombi <- rbind(Xnew, Xtrain) # combine new objects and
> training objects
> predcombi <- predict(model, Xcombi, proximity=TRUE) #
> calculate proximities
> proxcombi <- predcombi$proximity # get proximities of combined dataset
> proxnew <- proxcombi[(1:nnew),-(1:nnew)] # get proximities of
> new objects
> only
>
> But this approach causes a lot of wasted computation time as I am not
> interested in the proximities among the training objects
> themselves but
> only among the training objects and the new objects. With
> 1000 training
> objects and 5 new objects, I have to calculate a 1005x1005
> proximity matrix
> to get the essential 5x1000 matrix of the new objects only.
>
> Am I doing something wrong? I read through the documentation
> but could not
> find another solution. Any advice would be highly appreciated.
>
> Thanks in advance!
> Kilian
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Notice: This e-mail message, together with any attachme...{{dropped:11}}
More information about the R-help
mailing list