[R] using ddply but preserving some of the outside data
Jarrett Byrnes
byrnes at msi.ucsb.edu
Wed Aug 5 21:00:40 CEST 2009
I have a bit of a quandy. I'm working with a data set for which I
have sampled sites at a variety of dates. I want to use this data,
and get a running average of the sampled values for the current and
previous date.
I originally thought something like ddply would be ideal for this,
however, I cannot break up my data by date, and then apply a function
that requires information about the previous dates.
I had thought to use a for loop and merge, but that doesn't quite seem
to be working.
So, my questions are twofold
1) Is there a way to use something like the plyr library to do this
efficiently
1a) Indeed, is there a way to use ddply or its ilk to have a function
that returns a vector of values, and then assign the variables you are
sorting by to the whole vector? Or maybe making each value it's own
column in the new data frame, and then using reshape is the answer.
Hrm. Seems clunky.
2) Or, can a for loop around a plyr-kind of statement do the trick
(and if so, pointers on why the below code won't work) (also, it, too,
seems clunkier than I would like)
sites<-c("a", "b", "c")
dates<-1:5
a.df<-expand.grid(sites=sites, dates=dates)
a.df$value<-runif(15,0,100)
a.df<-as.data.frame(a.df)
#now, I want to get the average of the
mean2<-function(df, date){
sub.df<-subset(df, df$dates-date<1 &
df$dates-date>-1 )
return(mean(df$value))
}
my.df<-data.frame(sites=NA, dates=NA, V1=NA)
for(a.date in a.df$dates){
new.df<-ddply(a.df, "sites", function(df) mean2 (df, a.date))
my.df<-merge(my.df, new.df) #doesn't seem to work
}
my.df
More information about the R-help
mailing list