[R] linear model coefficients by year and industry, fitted values, residuals, panel data
Cecilia Carmo
cecilia.carmo at ua.pt
Thu Apr 4 17:43:57 CEST 2013
The problem was my R version. After installing the more recent version the problem is solved.
mutate() is much more easier.
Thank you.
Cecília Carmo
________________________________________
De: Peter Ehlers [ehlers at ucalgary.ca]
Enviado: quinta-feira, 4 de Abril de 2013 16:29
Para: Cecilia Carmo
Cc: r-help at r-project.org; Adams, Jean
Assunto: Re: [R] linear model coefficients by year and industry, fitted values, residuals, panel data
On 2013-04-04 02:11, Cecilia Carmo wrote:
> Thank you all. I'm very happy with this solution. Just two questions:
> I use mutate() with package plyr and it gaves me a error message, is it a new function and my package may be old?
> Is there any extractor for the R-squared?
>
> Thanks again,
>
> Cecília Carmo
According to the plyr NEWS file, mutate was introduced in
Version 1.3 (2010-12-28). I would hope that your version is
newer than that. You should tell us what the error message is.
Anyway, you can always use R's within() function instead;
or use transform() as Jean suggested.
Peter Ehlers
>
> ________________________________________
> De: Peter Ehlers [ehlers at ucalgary.ca]
> Enviado: quarta-feira, 3 de Abril de 2013 19:01
> Para: Adams, Jean
> Cc: Cecilia Carmo; r-help at r-project.org
> Assunto: Re: [R] linear model coefficients by year and industry, fitted values, residuals, panel data
>
> A few minor improvements to Jean's post suggested inline below.
>
> On 2013-04-03 05:41, Adams, Jean wrote:
>> Cecilia,
>>
>> Thanks for providing a reproducible example. Excellent.
>>
>> You could use the ddply() function in the plyr package to fit the model for
>> each industry and year, keep the coefficients, and then estimate the fitted
>> and residual values.
>>
>> Jean
>>
>> library(plyr)
>> coef <- ddply(final3, .(industry, year), function(dat) lm(Y ~ X + Z,
>> data=dat)$coef)
>> names(coef) <- c("industry", "year", "b0", "b1", "b2")
>> final4 <- merge(final3, coef)
>> newdata1 <- transform(final4, Yhat = b0 + b1*X + b2*Z)
>> newdata2 <- transform(newdata1, residual = Y-Yhat)
>> plot(as.factor(newdata2$firm), newdata2$residual)
>
> Suggestion 1:
> Use the extractor function coef() and also avoid using the name
> of an R function as a variable name:
>
> Coef <- ddply(...., function(dat) coef(lm(....)))
>
> Suggestion 2:
> Use plyr's mutate() to do both transforms at once:
>
> newdata <- mutate(final4,
> Yhat = b0 + b1*X + b2*Z,
> residual = Y-Yhat)
>
> [Or you could use within(), but I now find mutate handier, mainly
> because it doesn't 'reverse' the order of the new variables.]
>
> Suggestion 3:
> Use the 'data=' argument in the plot:
>
> boxplot(residual ~ firm, data = newdata)
>
> Peter Ehlers
>
>>
>> On Wed, Apr 3, 2013 at 3:38 AM, Cecilia Carmo <cecilia.carmo at ua.pt> wrote:
>>
>>> Hi R-helpers,
>>>
>>>
>>>
>>> My real data is a panel (unbalanced and with gaps in years) of thousands
>>> of firms, by year and industry, and with financial information (variables
>>> X, Y, Z, for example), the number of firms by year and industry is not
>>> always equal, the number of years by industry is not always equal.
>>>
>>>
>>>
>>> #reproducible example
>>> firm1<-sort(rep(1:10,5),decreasing=F)
>>> year1<-rep(2000:2004,10)
>>> industry1<-rep(20,50)
>>> X<-rnorm(50)
>>> Y<-rnorm(50)
>>> Z<-rnorm(50)
>>> data1<-data.frame(firm1,year1,industry1,X,Y,Z)
>>> data1
>>> colnames(data1)<-c("firm","year","industry","X","Y","Z")
>>>
>>>
>>>
>>> firm2<-sort(rep(11:15,3),decreasing=F)
>>> year2<-rep(2001:2003,5)
>>> industry2<-rep(30,15)
>>> X<-rnorm(15)
>>> Y<-rnorm(15)
>>> Z<-rnorm(15)
>>> data2<-data.frame(firm2,year2,industry2,X,Y,Z)
>>> data2
>>> colnames(data2)<-c("firm","year","industry","X","Y","Z")
>>>
>>> firm3<-sort(rep(16:20,4),decreasing=F)
>>> year3<-rep(2001:2004,5)
>>> industry3<-rep(40,20)
>>> X<-rnorm(20)
>>> Y<-rnorm(20)
>>> Z<-rnorm(20)
>>> data3<-data.frame(firm3,year3,industry3,X,Y,Z)
>>> data3
>>> colnames(data3)<-c("firm","year","industry","X","Y","Z")
>>>
>>>
>>>
>>> final1<-rbind(data1,data2)
>>> final2<-rbind(final1,data3)
>>> final2
>>> final3<-final2[order(final2$industry,final2$year),]
>>> final3
>>>
>>>
>>>
>>> I need to estimate a linear model Y = b0 + b1X + b2Z by industry and year,
>>> to obtain the estimates of b0, b1 and b2 by industry and year (for example
>>> I need to have de b0 for industry 20 and year 2000, for industry 20 and
>>> year 2001...). Then I need to calculate the fitted values and the residuals
>>> by firm so I need to keep b0, b1 and b2 in a way that I could do something
>>> like
>>> newdata1<-transform(final3,Y'=b0+b1.X+b2.Z)
>>> newdata2<-transform(newdata1,residual=Y-Y')
>>> or another way to keep Y' and the residuals in a dataframe with the
>>> columns firm and year.
>>>
>>>
>>>
>>> Until now I have been doing this in very hard way and because I need to do
>>> it several times, I need your help to get an easier way.
>>>
>>>
>>>
>>> Thank you,
>>>
>>>
>>>
>>> Cecília Carmo
>>>
>>> Universidade de Aveiro
>>>
>>> Portugal
>> >
More information about the R-help
mailing list