[R] Multiple regression (with interactions) by hand

Christoph Scherber Christoph.Scherber at agr.uni-goettingen.de
Tue Sep 3 15:22:02 CEST 2013


Dear all,

But why are there such huge differences betwen solve() and ginv()? (see code below)?

##
m1=lm(Ozone~Solar.R*Wind,airquality)

# remove NA´s:
airquality2=airquality[complete.cases(airquality$Ozone)&
complete.cases(airquality$Solar.R)&
complete.cases(airquality$Wind),]

# create the model matrix by hand:
X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind)
# is the same as:
model.matrix(m1)
# create the response vector by hand:
Y=airquality2$Ozone
# is the same as:
m1$model$Ozone
# Now solve for the parameter estimates:

solve(crossprod(X)) %*% crossprod(X,Y) #gives the correct answer

library(MASS)
ginv(t(X)%*%X)%*%t(X)%*%Y #gives a wrong answer





Am 03/09/2013 12:29, schrieb Joshua Wiley:
> Hi Christoph,
> 
> Use this matrix expression instead:
> 
> solve(crossprod(X)) %*% t(X) %*% Y
> 
> Note that:
> 
> all.equal(crossprod(X), t(X) %*% X)
> 
> Cheers,
> 
> Joshua
> 
> 
> 
> On Tue, Sep 3, 2013 at 2:51 AM, Christoph Scherber
> <Christoph.Scherber at agr.uni-goettingen.de> wrote:
>> Dear all,
>>
>> I´ve played around with the "airquality" dataset, trying to solve the matrix equations of a simple
>> multiple regression by hand; however, my matrix multiplications don´t lead to the estimates returned
>> by coef(). What have I done wrong here?
>>
>> ##
>> m1=lm(Ozone~Solar.R*Wind,airquality)
>>
>> # remove NA´s:
>> airquality2=airquality[complete.cases(airquality$Ozone)&
>> complete.cases(airquality$Solar.R)&
>> complete.cases(airquality$Wind),]
>>
>> # create the model matrix by hand:
>> X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind)
>> # is the same as:
>> model.matrix(m1)
>>
>> # create the response vector by hand:
>> Y=airquality2$Ozone
>> # is the same as:
>> m1$model$Ozone
>>
>> # Now solve for the parameter estimates:
>> library(MASS)
>> ginv(t(X)%*%X)%*%t(X)%*%Y
>>
>> # is not the same as:
>> coef(m1)
>>
>> ##
>> Now why is my result (line ginv(...)) not the same as the one returned by coef(m1)?
>>
>> Thanks very much for your help!
>>
>> Best regards,
>> Christoph
>>
>> [using R 3.0.1 on Windows 7 32-Bit]
>>
>>
>>
>>
>>
>> --
>> PD Dr Christoph Scherber
>> Georg-August University Goettingen
>> Department of Crop Science
>> Agroecology
>> Grisebachstrasse 6
>> D-37077 Goettingen
>> Germany
>> phone 0049 (0)551 39 8807
>> fax 0049 (0)551 39 8806
>> http://www.gwdg.de/~cscherb1
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 



More information about the R-help mailing list