[R] hatvalues?
rkevinburton at charter.net
rkevinburton at charter.net
Thu Mar 5 17:39:30 CET 2009
I am struiggling a bit with this function 'hatvalues'. I would like a little more undrestanding than taking the black-box and using the values. I looked at the Fortran source and it is quite opaque to me. So I am asking for some help in understanding the theory. First, I take the simplest case of a single variant. For this I turn o John Fox's book, "Applied Regression Analysis and Generalized Linear Models, p 245 and generate this 'R' code:
> library(car)
> attach(Davis)
# remove the NA's
> narepwt <- repwt[!is.na(repwt)]
> meanrw <- mean(narepwt)
> drw <- narepwt - meanrw
> ssrw <- sum(drw * drw)
> h <- 1/length(narepwt) + (drw * drw)/ssrw
> h
This gives me a array of values the largest of which is
> order(h, decreasing=TRUE)
[1] 21 52 17 93 30 62 158 113 175 131 182 29 106 125 123 146 91 99
So the largest "hatvalue" is
> h[21]
[1] 0.1041207
Which doesn't match the 0.714 value that is reported in the book but I will probably take that up with the author later.
Then I use more of 'R' and I get
fit <- lm(weight ~ repwt)
hr <- hatvalues(fit)
hr[21]
21
0.1041207
So this matches which is reasusing. My question is this, given the QR transformation and the residuals derived from that transformation what is a simple matrix formula for the hatvalues?
>From http://en.wikipedia.org/wiki/Linear_regression I get
residuals = y - Hy = y(I - H)
or
H = -(residuals/y - I)
> fit <- lm(weight ~ repwt)
> h <- -(residuals(fit)/weight[as.numeric(names(residuals(fit)))] - diag(1,length(residuals(fit)), length(residuals(fit))))
This generates a matrix but I cannot see any coerrelation between this "hat-matrix" and the return from "hatvalues".
Comments?
Thank you.
Kevin
More information about the R-help
mailing list