[R] regression on data subsets in datafile
Dennis Murphy
djmuser at gmail.com
Mon Sep 12 11:15:08 CEST 2011
Hi:
Here's one approach:
# date typo fixed in record 5 - changed 35 to 5
tC <- textConnection("
Subject Date parameter1
bob 3/2/99 10
bob 4/2/99 10
bob 5/5/99 10
bob 6/27/99 NA
bob 8/5/01 10
bob 3/2/02 10
steve 1/2/99 4
steve 2/2/00 7
steve 3/2/01 10
steve 4/2/02 NA
steve 5/2/03 16
kevin 6/5/04 24
")
dat <- read.table(tC, header=TRUE, stringsAsFactors = FALSE)
close.connection(tC)
rm(tC)
# Convert Date to an object of class Date
dat <- transform(dat, date = as.Date(Date, format = '%m/%d/%y'))
# You could do this with transform() and the by() function, but
# here is another way to use the min date per person as time 0
# using package plyr; mutate is a faster alternative to transform
# and can be used for groupwise operations inside of ddply():
library('plyr')
dat <- ddply(dat, .(Subject), mutate, days = as.numeric(date - min(date)))
# Since Kevin has one record, want to return NAs for his coefficients
# The function f returns NA if there are less than three observations
# per subgroup; you can change 3 to 2 if you like. Otherwise, it returns
# the coefficients of the least squares line as a data frame.
f <- function(d) {
if(nrow(d) < 3) {return(data.frame(intercept = NA, slope = NA))
} else {
p <- coef(lm(parameter1 ~ days, data = d))
data.frame(intercept = p[1], slope = p[2])
}
}
# Apply the function to each person's sub-data frame
ddply(dat, .(Subject), f)
Subject intercept slope
1 bob 10.000000 0.000000000
2 kevin NA NA
3 steve 3.998485 0.007591638
Another option is to use the lmList() function in the nlme package.
HTH,
Dennis
On Mon, Sep 12, 2011 at 12:42 AM, marcel <marcelcurlin at gmail.com> wrote:
> I have data of the form
>
> tC <- textConnection("
> Subject Date parameter1
> bob 3/2/99 10
> bob 4/2/99 10
> bob 5/5/99 10
> bob 6/27/99 NA
> bob 8/35/01 10
> bob 3/2/02 10
> steve 1/2/99 4
> steve 2/2/00 7
> steve 3/2/01 10
> steve 4/2/02 NA
> steve 5/2/03 16
> kevin 6/5/04 24
> ")
> data <- read.table(header=TRUE, tC)
> close.connection(tC)
> rm(tC)
>
> I am trying to calculate rate of change of parameter1 in units/day for each
> person. I think I need something like:
> "lapply(split(mydata, mydata$ppt), function(x) lm(parameter1 ~ day,
> data=x))"
>
> I am not sure how to handle the dates in order to have the first day for
> each person be time = 0, and the remaining dates to be handled as days since
> time 0. Also, is there a way to add the resulting slopes to the data set as
> a new column?
>
> Thanks,
> Marcel
>
> --
> View this message in context: http://r.789695.n4.nabble.com/regression-on-data-subsets-in-datafile-tp3806743p3806743.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list