[R] Bootstrap tree selection in rpart
Fiona Callaghan
fmc2+ at pitt.edu
Thu Sep 13 16:30:37 CEST 2007
Thanks very much for replying -- just one final question: does this hold
when the outcome is continuous (and not discrete) e.g instead of the
outcome being multinomial we have a continuous outcome like residuals?
Thanks again
Fiona
> Fiona Callaghan asked about using the bootstrap instead of
> cross-validation in
> the tree pruning step.
> It turns out that cross-validation works better than the bootstrap for
> trees.
> The issue is a subtle one. The bootstrap can be thought of as 2 steps.
>
> 1. Deduction: Evaluate the behavior of some statistic "zed" under
> repeated
> sampling from the discrete distribution F-hat, i.e., the original data.
> This
> gives a direct evaluation of how zed behaves under F-hat.
>
> 2. Induction: Assume that (behavior of zed under sampling from F) =
> (behavior
> under sampling from F-hat).
>
> It turns out that trees behave differently under discreet distributions
> than
> they do under continuous ones, so step 2 fails. Essentially, there are
> fewer
> places to split in the discrete case, tree creation is less noisy, and the
> bootstrap gives an overoptimistic view. I remember Brad Efron giving a
> talk on
> this long ago (I was still a student!), so the details are fuzzy; I think
> that
> he solved it by sampling from a smoothed version of the empirical CDF.
>
> Terry Therneau
>
--
Fiona Callaghan, MA MS
A432 Crabtree Hall
Department of Biostatistics
Graduate School of Public Health
University of Pittsburgh
Phone 412 624 3063
More information about the R-help
mailing list