[R] mboost vs gbm

thebennjammin benn.ackley at gmail.com
Mon Jul 23 23:55:25 CEST 2012


I'm attempting to fit boosted regression trees to a censored response using
IPCW weighting.  I've implemented this through two libraries, mboost and
gbm, which I believe should yield models that would perform comparably. 
This, however, is not the case - mboost performs much better.  This seems
odd.  This issue is meaningful since the output of this regression needs to
be implemented in a production system, and mboost doesn't even expose the
ensembles.  

# default params for blackboost are a gaussian loss, and maxdepth of 2
m.mboost = blackboost(Y ~ X1 + X2, data=tdata, weights=t.ipcw,
control=boost_control(mstop=100))
m.gbm = gbm(Y ~ X1 + X2, data=tdata, weights=t.ipcw,
distribution="gaussian", interaction.depth=2, bag.fraction=1, n.trees=2500)

# compare IPCW weighted squared loss
sum((predict(m.mboost, newdata=tdata)-tdata$Y)^2 * t.ipcw) <
sum((predict(m.gbm, newdata=tdata, n.trees=2500)-tdata$Y)^2 * t.ipcw)
# TRUE, mboost with 100 trees will reduce the loss function from gbm 100
trees by 20%, and gbm with 2500 trees by 5%

The documentation says blackboost essentially does the same thing as mboost,
so any ideas on what could be driving this large difference in performance?




--
View this message in context: http://r.789695.n4.nabble.com/mboost-vs-gbm-tp4637518.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list