[R] sum specific rows in a data frame
Chuck
vijay.nori at gmail.com
Thu Apr 15 03:16:25 CEST 2010
Depending on the size of the dataframe and the operations you are
trying to perform, aggregate or ddply may be better. In the function
below, df has the same structure as your dataframe.
Check out this code which runs aggregate and ddply for different
dataframe sizes.
============================
require(plyr)
CompareAggregation <- function(n) {
df = data.frame(id=c(rep("A",15*n), rep("B",10*n), rep("C",
20*n)))
df$fltval = rnorm(nrow(df))
df$intval = rbinom(nrow(df), 1000, 0.8)
t1 <- system.time(zz1 <- aggregate(list(fltsum=df$fltval,intsum=df
$intval), list(id=df$id), sum))
t2 <- system.time(zz2 <- ddply(df, .(id), function(x) c(sum(x
$fltval), sum(x$intval)) ))
return(c(agg=t1[[1]],ddply=t2[[1]]))
}
z <- c(10^seq(1,5))
names(z) <- as.character(z)
res.df <- t(data.frame(lapply(z, CompareAggregation)))
print(res.df)
============================
On Apr 14, 11:43 am, "arnaud Gaboury" <arnaud.gabo... at gmail.com>
wrote:
> Thank you for your help. The best I have found is to use the ddply function.
>
> > pose
More information about the R-help
mailing list