[R] R.squared in summary.lm with weights

Murray Efford murray.efford at otago.ac.nz
Fri Apr 8 20:45:33 CEST 2016

Thanks for these perfectly consistent replies - I didn't understand the purpose of m = sum(w * f/sum(w)) and saw it merely as a weighted average of the fitted values.

My ultimate concern is how to compute an appropriate weighted TSS (or equivalently, MSS) for PRESS-R^2 = 1 - PRESS/TSS = 1 - PRESS/ (MSS + PRESS). Do you think it then makes sense to substitute the vector of leave-one-out fitted values for f here?

m <- sum(w * f/sum(w))
mss <-  sum(w * (f - m)^2)

From: peter dalgaard <pdalgd at gmail.com>
Sent: Friday, 8 April 2016 11:28 p.m.
To: Duncan Murdoch
Cc: Murray Efford; r-help at r-project.org
Subject: Re: [R] R.squared in summary.lm with weights

On 08 Apr 2016, at 12:57 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote:

> On 07/04/2016 5:21 PM, Murray Efford wrote:
>> Following some old advice on this list, I have been reading the code for summary.lm to understand the computation of R-squared from a weighted regression. Usually weights in lm are applied to squared residuals, but I see that the weighted mean of the observations is calculated as if the weights are on the original scale:
>> [...]
>>     f <- z$fitted.values
>>     w <- z$weights
>> [...]
>>             m <- sum(w * f/sum(w))
>>             [mss <-]  sum(w * (f - m)^2)
>> [...]
>> This seems inconsistent to me. What am I missing?
> I think you are expecting consistency where there needn't be any.  Why do you see an inconsistency here?  Those are different calculations. You get expressions like these if you assume observations have variance sigma^2/w, and you're trying to estimate sigma^2.

It's also perfectly consistent that m is the minimizer of mss:

d/dm sum(w*(f-m)^2) = -2 sum(w*(f-m)) = 0 => m = sum(w*f) / sum(w)

However, beware the distiction between inverse variance weights, replication weights, and sampling weights.

> Duncan Murdoch
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

More information about the R-help mailing list