[R] hatvalues?

Thu Mar 5 18:10:23 CET 2009

Dear Kevin,

If you do the same regression as in the text then you'll get the same
hat-values; the regression is the one on the top of p. 244:

> mod <- lm(repwt ~ weight*sex, data=Davis)
> max(hatvalues(mod))
[1] 0.7141856

As to making sense of the computations:

> X <- model.matrix(mod)
> head(X)
  (Intercept) weight sexM weight:sexM
1           1     77    1          77
2           1     58    0           0
3           1     53    0           0
4           1     68    1          68
5           1     59    0           0
6           1     76    1          76
> H <- X %*% solve( t(X) %*% X ) %*% t(X)
> h <- diag(H)
> max(h)
[1] 0.7141856

I hope this helps,
 John

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
> Behalf Of rkevinburton at charter.net
> Sent: March-05-09 11:40 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] hatvalues?
> 
> I am struiggling a bit with this function 'hatvalues'.  I would like a
little
> more undrestanding than taking the black-box and using the values. I
looked
> at the Fortran source and it is quite opaque to me. So I am asking for
some
> help in understanding the theory. First, I take the simplest case of a
single
> variant. For this I turn o John Fox's book, "Applied Regression Analysis
and
> Generalized Linear Models, p 245 and generate this 'R' code:
> 
> > library(car)
> > attach(Davis)
> # remove the NA's
> > narepwt <- repwt[!is.na(repwt)]
> > meanrw <- mean(narepwt)
> > drw <- narepwt - meanrw
> > ssrw <- sum(drw * drw)
> > h <- 1/length(narepwt) + (drw * drw)/ssrw
> > h
> 
> This gives me a array of values the largest of which is
> 
> > order(h, decreasing=TRUE)
>   [1]  21  52  17  93  30  62 158 113 175 131 182  29 106 125 123 146  91
99
> 
> So the largest "hatvalue" is
> 
> > h[21]
> [1] 0.1041207
> 
> Which doesn't match the 0.714 value that is reported in the book but I
will
> probably take that up with the author later.
> 
> Then I use more of 'R' and I get
> 
> fit <- lm(weight ~ repwt)
> hr <- hatvalues(fit)
> hr[21]
>        21
> 0.1041207
> 
> So this matches which is reasusing. My question is this, given the QR
> transformation and the residuals derived from that transformation what is
a
> simple matrix formula for the hatvalues?
> 
> >From http://en.wikipedia.org/wiki/Linear_regression I get
> 
> residuals = y - Hy = y(I - H)
> or
> H = -(residuals/y - I)
> 
> > fit <- lm(weight ~ repwt)
> > h <- -(residuals(fit)/weight[as.numeric(names(residuals(fit)))] -
> diag(1,length(residuals(fit)), length(residuals(fit))))
> 
> This generates a matrix but I cannot see any coerrelation between this
"hat-
> matrix" and the return from "hatvalues".
> 
> Comments?
> 
> Thank you.
> 
> Kevin
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.