[R] long format - find age when another variable is first 'high'
Marc Schwartz
marc_schwartz at me.com
Mon May 25 15:52:15 CEST 2009
On May 25, 2009, at 7:45 AM, David Freedman wrote:
>
> Dear R,
>
> I've got a data frame with children examined multiple times and at
> various
> ages. I'm trying to find the first age at which another variable
> (LDL-Cholesterol) is >= 130 mg/dL; for some children, this may never
> happen.
> I can do this with transformBy and ddply, but with 10,000 different
> children, these functions take some time on my PCs - is there a
> faster way
> to do this in R? My code on a small dataset follows.
>
> Thanks very much, David Freedman
>
> d<-data.frame(id=c(rep(1,3),rep(2,2),
> 3),age=c(5,10,15,4,7,12),ldlc=c(132,120,125,105,142,160))
> d$high.ldlc<-ifelse(d$ldlc>=130,1,0)
> d
> library(plyr)
> d2<-ddply(d,~id,transform,plyr.minage=min(age[high.ldlc==1]));
> library(doBy)
> d2<-transformBy(~id,da=d2,doby.minage=min(age[high.ldlc==1]));
> d2
The first thing that I would do is to get rid of records that are not
relevant to your question:
> d
id age ldlc high.ldlc
1 1 5 132 1
2 1 10 120 0
3 1 15 125 0
4 2 4 105 0
5 2 7 142 1
6 3 12 160 1
# Get records with high ldl
d.new <- subset(d, ldlc >= 130)
> d.new
id age ldlc high.ldlc
1 1 5 132 1
5 2 7 142 1
6 3 12 160 1
That will help to reduce the total size of the dataset, perhaps
substantially. It will also remove entire subjects that are not
relevant (eg. never have LDL >= 130).
Then get the minimum age for each of the remaining subjects:
> aggregate(d.new$age, list(id = d.new$id), min)
id x
1 1 5
2 2 7
3 3 12
Try that to see what sort of time reduction you observe.
HTH,
Marc Schwartz
More information about the R-help
mailing list