[R] bootstrap
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Nov 12 16:33:32 CET 2007
On Mon, 12 Nov 2007, Stefano Ghirlanda wrote:
> i am using the boot package for some bootstrap calculations in place
> of anovas. one reason is that my dependent variable is distributed
> bimodally, but i would also like to learn about bootstrapping in
> general (i have ordered books but they have not yet arrived).
>
> i get the general idea of bootstrapping but sometimes i do not know
> how to define suitable statistics to test specific hypotheses.
That's a basic issue in statistics. Bootstrapping is only another way to
assess the variability of a pre-determined statistic (and incidentally is
not much used for testing, more often for confidence intervals).
> two examples follow.
>
> 1) comparing the means of more than two groups. a suitable statistics
> could be the sum of squared deviations of group means from the
> grand mean. does this sound reasonable?
No. That means nothing by itself, but needs to be compared to the
residual variation (e.g. by an F statistic).
> 2) testing for interactions. e.g., i want to see whether an
> independent variable has the same effect in two different
> samples. in an anova this would be expressed as the significance,
> or lack thereof, of the interaction between a "sample" factor and
> another factor for the independent variable. how would i do this
> with a bootstrap calculation?
>
> my problem with 2) is that when one fits a linear model to the data,
> from which sums of squares for the anova are calculated, the
> interaction between the two factors corresponds to many regression
> coefficients in the linear model (e.g., i actually have three samples
> and an independent variable with four levels). i do not know how to
> summarize these in a single statistics.
Any good book on statistics with R (e.g. MASS) would point you at the
anova() function to compare models.
> i have seen somewhere that some people calculate F ratios
> nevertheless, but then test them against a bootstrapped distribution
> rather than against the F distribution. is this a sensible approach?
> could one also use sums of squares directly as the bootstrapped
> statistics?
It is not totally off the wall. The problems are
- How you bootstrap, given that you don't have a single homogeneous group.
You seem to want to test, so you need to emulate the null-hypothesis
distribution. The most common way to do that is to fit a model, find
some sort of residuals, bootstrap those and use them as errors to
simulate from the null hypothesis. At that point you will have to work
hard to convince many statisticians that you have improved over the
standard theory or a simulation from a long-tailed error distribution.
- That if you don't believe in the normal distribution of your errors
(and not response), you probably should not be using least-squares
based statistical methodology. And remember that the classical ANOVA
tests are supported by permutation arguments which are very similar to
the bootstrap (just sampling without replacement instead of with).
These points are discussed with examples in the linear models chapter of
MASS (the book) and also in the Davison-Hinkley book which the 'boot'
package supports.
[Shame about the broken shift key, although it seems to work with F:
keyboards are really cheap to replace these days.]
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list