[R] Regression Identity

peter dalgaard pdalgd at gmail.com
Wed Jul 18 10:26:09 CEST 2012


On Jul 18, 2012, at 05:11 , darnold wrote:

> Hi,
> 
> I see a lot of folks verify the regression identity SST = SSE + SSR
> numerically, but I cannot seem to find a proof. I wonder if any folks on
> this list could guide me to a mathematical proof of this fact.
> 

Wrong list, isn't it? 

http://stats.stackexchange.com/ is -----> _that_ way...

Anyways: Any math stats book should have it somewhere inside. There are two basic approaches, depending on what level of abstraction one expects from students.

First principles: Write out  SST=sum((y-yhat)+(yhat-ybar))^2 and use the normal equations to show that the sum of product terms is zero. This is a bit tedious, but straightforward in principle.

Linear algebra: The least squares fitted values are the orthogonal projection onto a subspace of R^N (N=number of observations). Hence the vector of residuals is orthogonal to the vector (yhat - ybar) and the N-dimensional version of the Pythagorean theorem is

||yhat - ybar||^2 + ||y - yhat||^2 == ||y - ybar||^2

since the three vectors involved form a right-angled triangle. (http://en.wikipedia.org/wiki/Pythagorean_theorem, scroll down to "Inner product spaces".) 


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list