[R] Linear Regression with 2 grouping variables
Dennis Murphy
djmuser at gmail.com
Tue Aug 23 07:12:10 CEST 2011
Hi:
You're kind of on the right track, but there is no conditioning
formula in lm(); it's not lattice :) This is relatively easy to do
with the plyr package, though:
library('plyr')
# Generate a list of models - the subsetting variables (Site, Vial) are
# used to generate the data splits and the function is run on a generic
# data split d, assumed to be a data frame.
mlist <- dlply(feed1, .(Site, Vial), function(d) lm(lnRFU ~ Time, data = d))
# To get the set of model coefficients, take the list mlist as the input
# data (argument 1) and then for each generic component 'm', extract its
# coefficients:
ldply(mlist, function(m) coef(m))
For your test data, only Vial varies - the example code below reflects that:
> mlist <- dlply(feed1, .(Vial), function(d) lm(lnRFU ~ Time, data = d))
> length(mlist) # three component model objects
[1] 3
> ldply(mlist, function(m) coef(m))
Vial (Intercept) Time
1 1 10.75440 -0.001508621
2 2 10.83171 -0.005095100
3 3 10.81087 -0.004897600
This idea can be generalized: if you want to pull out similar pieces
of output from each model, run ldply() on the list of models and
create a utility function that outputs, for a generic model object m,
what you want to have returned. Common choices include R^2 values,
tables of coefficients (as lists instead of data frames) or residuals
and predicted values. The game is to write the function so that it
takes a [list] model object (here, m) as input and a data frame as
output. You can also extract output from summary(m) in a similar way,
using m as the input object.
HTH,
Dennis
On Mon, Aug 22, 2011 at 6:15 PM, Nathan Miller <natemiller77 at gmail.com> wrote:
> Hi all,
>
> I have a data set that looks a bit like this.
>
> feed1
> RFU Site Vial Time lnRFU
> 1 44448 1 1 10 10.702075
> 2 47521 1 1 20 10.768927
> 3 42905 1 1 30 10.66674
> 4 46867 1 1 40 10.755069
> 5 42995 1 1 50 10.668839
> 6 43074 1 1 60 10.670675
> 7 41195 1 1 70 10.626072
> 8 47090 1 2 10 10.759816
> 9 48100 1 2 20 10.781037
> 10 43215 1 2 30 10.673943
> 11 39656 1 2 40 10.587998
> 12 38799 1 2 50 10.566150
> 13 38424 1 2 60 10.556438
> 14 35240 1 2 70 10.469937
> 15 46427 1 3 10 10.745636
> 16 46418 1 3 20 10.745443
> 17 42095 1 3 30 10.647684
> ......
> There are 5 columns of data, three levels of "Site", 10 "Vials" per site,
> and measurements were taken at 10 min intervals from 10-70.. I am primarily
> interested in the relationship between "Time" and "lnRFU" to calculate the
> rate at which lnRFU declines over time. I have a nice plot using a ggplot2
> code that looks like this
>
> p<-ggplot(data=feed1,aes(x=Time,y=lnRFU))
> p+geom_point(size=4)+facet_grid(Site~Vial)+geom_smooth(method="lm")
>
> The graph is useful to visualize the changes over time and grouped by both
> Site and Vial, but I also need the slopes of the linear regressions for each
> Vial, within a Site. This is where I run into a problem. I want to run a
> linear regression of lnRFU as a function of Time grouped by both Site and
> Vial. Its easy to visualize this comparison in ggplot using facet_grid(),
> but I'm not sure how to do a similar comparison/analysis within lm()
>
> I imagine something like
>
> fit<-lm(lnRFU~Time | Vial * Site, data=feed1)
>
> in which I group by both Vial and Site, but obviously this code doesn't
> work. Does anyone have an idea for how to do a linear regression with two
> grouping variables? Do I have to go back and combine Vial and Site into a
> single grouping variable or can I leave the dataframe the way it is? I'm
> trying to imagine a means of accomplishing the same type of thing that
> facet_grid does when it allows you to plot the data as a function of two
> "grouping" variables.
>
> Thanks for you time. I greatly appreciate it.
>
> Nate Miller
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list