[R] Aggregate with non-scalar function
Mike Nielsen
mr.blacksheep at gmail.com
Wed Nov 7 17:46:41 CET 2007
R-Helpers,
I'm sorry to have to ask this -- I've not used R very much in the last
8 or 10 months, and I've gotten rusty.
I have the following (ff2 is a subset of a much, much larger dataset):
> ff2
hostName user sys idle obsTime
10142 fred 0.4 0.5 98.0 2007-11-01 02:02:18
16886 barney 0.5 0.2 94.6 2007-10-25 19:12:12
8795 fred 0.0 0.1 99.8 2007-10-30 05:08:22
5261 fred 0.1 0.2 99.7 2007-10-25 07:20:32
12427 barney 0.1 0.2 93.2 2007-10-19 14:34:10
18067 barney 0.1 0.2 99.4 2007-10-27 10:34:08
973 fred 0.0 0.2 99.8 2007-10-19 08:24:22
5426 fred 0.2 0.3 99.5 2007-10-25 12:50:33
7067 fred 0.1 0.2 99.4 2007-10-27 19:32:27
13159 barney 0.1 0.4 84.3 2007-10-20 14:58:11
17481 barney 1.2 2.0 92.6 2007-10-26 15:02:11
21632 barney 0.1 0.1 99.6 2007-11-01 09:24:09
206 fred 19.4 4.8 53.7 2007-10-18 06:50:34
18151 barney 0.1 0.2 94.9 2007-10-27 13:22:09
10662 fred 0.1 0.2 99.6 2007-11-01 19:22:27
10376 fred 0.0 0.2 99.7 2007-11-01 09:50:24
3630 fred 43.7 7.0 33.0 2007-10-23 00:58:27
1118 fred 0.6 0.4 98.9 2007-10-19 13:14:23
5122 fred 0.1 0.2 99.6 2007-10-25 02:42:21
22117 barney 0.0 0.2 99.4 2007-11-02 01:34:04
> doit(ff2)
hostName hour user.mean sys.mean idle.mean user.max sys.max idle.max
1 barney 01 0.00 0.20 99.40 0.0 0.2 99.4
2 barney 09 0.10 0.10 99.60 0.1 0.1 99.6
3 barney 10 0.10 0.20 99.40 0.1 0.2 99.4
4 barney 13 0.10 0.20 94.90 0.1 0.2 94.9
5 barney 14 0.10 0.30 88.75 0.1 0.4 93.2
6 barney 15 1.20 2.00 92.60 1.2 2.0 92.6
7 barney 19 0.50 0.20 94.60 0.5 0.2 94.6
8 fred 00 43.70 7.00 33.00 43.7 7.0 33.0
9 fred 02 0.25 0.35 98.80 0.4 0.5 99.6
10 fred 05 0.00 0.10 99.80 0.0 0.1 99.8
11 fred 06 19.40 4.80 53.70 19.4 4.8 53.7
12 fred 07 0.10 0.20 99.70 0.1 0.2 99.7
13 fred 08 0.00 0.20 99.80 0.0 0.2 99.8
14 fred 09 0.00 0.20 99.70 0.0 0.2 99.7
15 fred 12 0.20 0.30 99.50 0.2 0.3 99.5
16 fred 13 0.60 0.40 98.90 0.6 0.4 98.9
17 fred 19 0.10 0.20 99.50 0.1 0.2 99.6
> doit
function(x){
x.mean<-aggregate(x[,c("user","sys","idle")],
by=list(hostName=x$hostName,
hour=strftime(as.POSIXlt(x$obsTime),"%H")),
mean)
x.max<-aggregate(x[,c("user","sys","idle")],
by=list(hostName=x$hostName,
hour=strftime(as.POSIXlt(x$obsTime),"%H")),
max)
t1<-merge(x.mean,x.max,by=c("hostName","hour"),suffixes=c(".mean",".max"))
return(t1)
}
The point of the "doit" function is to make a new dataframe in which
the columns are summary statistics of certain columns in the argument.
Is there a function similar to:
magic.function(ff2[,c("user","system","idle")],
by=list(hostName=ff2$hostName,hour=strftime(as.POSIXlt(ff2$obsTime),"%H")),
function(x){c(mean.user=mean(x$user),
mean.system=mean(x$system),
mean.idle=mean(x$idle),
max.user=max(x$user),
max.system=max(x$system),
max.idle=max(x$idle))})
ie. an "aggregate" that can cope with a non-scalar function and "do
what I mean"? My doit function gets more and more ugly the more
summary statistics I add, and I worry about the "merge" with hundreds
of thousands of rows.
I'm almost sure I've seen a solution to what I know is a simple
problem, but I guess my search skills are as bad as my "R": I've
rummaged around the r-help archives and came up with nothing to show
for it.
Pointers would be gratefully received.
Many thanks.
--
Regards,
Mike Nielsen
More information about the R-help
mailing list