[R] example to demonstrate benefits of poly in regression?

Mon Apr 1 19:42:12 CEST 2013

On Mon, Apr 1, 2013 at 1:20 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
> Here's my little discussion example for a quadratic regression:
>
> http://pj.freefaculty.org/R/WorkingExamples/regression-quadratic-1.R
>
> Students press me to know the benefits of poly() over the more obvious
> regression formulas.
>
> I think I understand the theory on why poly() should be more numerically
> stable, but I'm having trouble writing down an example that proves the
> benefit of this.
>
> I am beginning to suspect that because R's lm uses a QR decomposition under
> the hood, the instability that we used to see when we inverted (X'X) is no
> longer so serious as it used to be. When the columns in X are
>
> x1 x2 x3 x3squared
>
> then the QR decomposition is doing about the same thing as we would do if
> we replaced x3 and x3squared by the poly(x3, 2) results.
>
> Or not. I'm not arguing that, I'm thinking out loud, hoping you'll correct
> me.
>

# 1. One benefit is if you fit a higher degree polynomial using
poly(x, n) the lower degree coefficients will agree with the fit using
a lower n.
# For example, compare these two:
set.seed(123)
y <- rnorm(10); x <- 1:10
lm(y ~ poly(x, 2))
lm(y ~ poly(x, 3))

# however, that is not true if you don't use orthogonal polynomials.
Compare these two:
lm(y ~ poly(x, 2, raw = TRUE))
lm(y ~ poly(x, 3, raw = TRUE))

# 2. A second advantage is that the t statistics are invariant under
shift if you use orthogonal polynomials
# compare these two:
summary(lm(y ~ poly(x, 2)))
summary(lm(y ~ poly(x+1, 2)))

# however, that is not true if you don;t use orthogonal polynomials,
Compare these two:
summary(lm(y ~ poly(x, 2, raw = TRUE)))
summary(lm(y ~ poly(x+1, 2, raw = TRUE)))

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com