[R] randomForest: proximity for new objects using an existing rf

Liaw, Andy andy_liaw at merck.com
Wed Feb 1 16:39:38 CET 2012


There's an alternative, but it may not be any more efficient in time or memory...

You can run predict() on the training set once, setting nodes=TRUE.  That will give you a n by ntree matrix of which node of which tree the data point falls in.  For any new data, you would run predict() with nodes=TRUE, then compute the proximity "by hand" by counting how often any given pair landed in the same terminal node of each tree.

Andy 

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Kilian
> Sent: Wednesday, February 01, 2012 5:39 AM
> To: r-help at r-project.org
> Subject: [R] randomForest: proximity for new objects using an 
> existing rf
> 
> Dear all,
> 
> using an existing random forest, I would like to calculate 
> the proximity
> for a new test object, i.e. the similarity between the new 
> object and the
> old training objects which were used for building the random 
> forest. I do
> not want to build a new random forest based on both old and 
> new objects.
> 
> Currently, my workaround is to calculate the proximites of a 
> combined data
> set consisting of training and new objects like this:
> 
> model <- randomForest(Xtrain, Ytrain) # build random forest
> nnew <- nrow(Xnew) # number of new objects
> Xcombi <- rbind(Xnew, Xtrain) # combine new objects and 
> training objects
> predcombi <- predict(model, Xcombi, proximity=TRUE) # 
> calculate proximities
> proxcombi <- predcombi$proximity # get proximities of combined dataset
> proxnew <- proxcombi[(1:nnew),-(1:nnew)] # get proximities of 
> new objects
> only
> 
> But this approach causes a lot of wasted computation time as I am not
> interested in the proximities among the training objects 
> themselves but
> only among the training objects and the new objects. With 
> 1000 training
> objects and 5 new objects, I have to calculate a 1005x1005 
> proximity matrix
> to get the essential 5x1000 matrix of the new objects only.
> 
> Am I doing something wrong? I read through the documentation 
> but could not
> find another solution. Any advice would be highly appreciated.
> 
> Thanks in advance!
> Kilian
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}



More information about the R-help mailing list