[R] taking daily means from hourly data

Thu Jul 15 07:33:47 CEST 2010

On Thu, Jul 15, 2010 at 12:52 AM, Meissner, Tony (DFW)
<Tony.Meissner at sa.gov.au> wrote:
> I have a data frame (morgan) of hourly river flow, river levels and wind direction and speed thus:
>         Time           hour lev.morgan lev.lock2 lev.lock1 flow   direction  velocity
> 1  2009-07-06 15:00:00   15      3.266     3.274     3.240 1710.6   180.282    4.352
> 2  2009-07-06 16:00:00   16      3.268     3.272     3.240 1441.8   192.338    5.496
> 3  2009-07-06 17:00:00   17      3.268     3.271     3.240 1300.1   202.294    2.695
> 4  2009-07-06 18:00:00   18      3.267     3.274     3.241 1099.1   237.161    2.035
> 5  2009-07-06 19:00:00   19      3.265     3.277     3.243  986.6   237.576    0.896
> 6  2009-07-06 20:00:00   20      3.266     3.281     3.242 1237.6   205.686    1.257
> 7  2009-07-06 21:00:00   21      3.267     3.280     3.242 1513.3    26.080    0.664
> 8  2009-07-06 22:00:00   22      3.267     3.281     3.242 1819.5   264.280    0.646
> 9  2009-07-06 23:00:00   23      3.267     3.281     3.242 1954.4   337.137    0.952
> 10 2009-07-07 00:00:00    0      3.267     3.281     3.242 1518.9   260.006    0.562
> 11 2009-07-07 01:00:00    1      3.267     3.281     3.242 1082.6   252.172    0.673
> 12 2009-07-07 02:00:00    2      3.267     3.280     3.243 1215.9   190.007    1.286
> 13 2009-07-07 03:00:00    3      3.267     3.279     3.244 1093.5   260.415    1.206
> :         :               :       :               :          :     :        :         :
> :         :               :       :               :          :     :        :         :
>

There are many possibilities.  Here are three.

#1 can be done with only the core of R.

#2 produces a zoo series which seems to be the logical representation
since it is, in fact, a series so its now already in the form for
other series operations.  See the 3 vignettes that come with zoo.

with #3 its easy to take different functions (avg, count, etc.) of
different columns and if you already know SQL its particularly
convenient.  See http://sqldf.googlecode.com

# DF2 is used in #1 and #3
DF2 <- data.frame(DF, Day = as.Date(format(DF$Time)))

# 1 - aggregate
aggregate(cbind(flow, direction, velocity) ~ Day, DF2, mean)

# 2 - zoo
library(zoo)
z <- read.zoo(DF, header = TRUE, tz = "GMT")
aggregate(z, as.Date, mean)

# 3 - sqldf
library(sqldf)
sqldf("select
   Day, avg(flow) Flow, avg(direction) Direction, avg(velocity) Velocity
   from DF2
   group by Day")