[R] Calculate daily means from 5-minute interval data

Richard O'Keefe r@oknz @end|ng |rom gm@||@com
Wed Sep 1 06:05:30 CEST 2021

I wrote:
> > By the time you get the data from the USGS, you are already far past the point
> > where what the instruments can write is important.
Rich Shepard replied:
> The data are important because they show what's happened in that period of
> record. Don't physicians take a medical history from patients even though
> those data are far past the point they occurred?

You have missed the point.  The issue is not the temporal distance, but the
fact that the data you have are NOT the raw instrumental data and are NOT
subject to the limitations of the recording instruments.  The data you get from
the USGS is not the raw instrumental value, and there is no longer any good
reason for there to be any gaps in it.  Indeed, the Rogue River data I looked
at explicitly includes some flows labelled "Ae" meaning that they are NOT the
instrumental data at all, but estimated.

> And I use emacs to replace the space between columns with commas so the date
> and the time are separate.

There does not seem to be any good reason for this.
As I demonstrated, it is easy to convert these timestamps to
POSIXct form, which is good for calculating with.
If you want to extract year, month, day, &c, by far the easiest
way is to convert to POSIXlt form (so keeping the timestamp as a
single field) and then use $<whatever> to extract the field.
> n <- as.POSIXlt("2003.04.05 06:07", format="%Y.%m.%d %H:%M", tz="UTC")
> n
[1] "2003-04-05 06:07:00 UTC"
> c(n$year+1900, n$mon+1, n$mday, n$hour, $min)
[1] 2003    4    5    6    7

> > The flow is dominated by a series of "bursts" with a fast onset to a peak
> > and a slow decay, coming in a range of sizes from quite small to rather
> > large, separated by gaps of 4 to 45 days.
> And when discharge is controlled by flows through a hydroelectric dam there
> is a lot of variability. The pattern is important to fish as well as
> regulators.

And what is important to fish is NOT captured by daily means and standard
deviations.  For what it's worth, my understanding is that most of the dams on
the Rogue River have been removed, leaving only the Lost Creek Lake one,
and that this has been good for the fish.

Suppose you have a day when there are 16 hours with no water at all flowing,
then 8 hours with 12 cumecs because a dam upstream is discharging.  Then
the daily mean is 4 cumecs, which might look good for fish, but it wasn't.
"Number of minutes below minimum safe level" might be more interesting
for the fish.

>From the data we have alone, we cannot tell which bursts are due to
releases from dams and which have other causes.  Dam releases are under
human control, storms are not.

Looking at the Rogue River data, plotting daily means
- lowers the peaks
- moves them right
- changes the overall shape
Not severely, mind you, but enough to avoid if you don't have to.

By the way, by far the easiest way to do day-wise summaries,
if you really feel you must, is to start with a POSIXct or POSIXlt
column, let's call it r$when, then
  d <- trunc(difftime(r$when, min(r$when), units="days)) + 1
  m <- aggregate(r$flow, by=list(d), FUN=mean)
  plot(m, type="l")
You can plug in other summary functions, not just mean.

  for all calculations involving dates and times,
  prefer using the built in date and time classes to
  hacking around the problem

  aggregate() is a good way to compute oddball summaries.

> > - how do I *detect* these bursts? (detecting a peak isn't too hard,
> >   but the peak is not the onset)
> > - how do I *characterise* these bursts?
> >   (and is the onset rate related to the peak size?)
> > - what's left after taking the bursts out?
> > - can I relate these bursts to something going on upstream?
> Well, those questions could be appropriate depending on what questions you
> need the data to answer.
> Environmental data are quite different from experimental, economic,
> financial, and public data (e.g., unemployment, housing costs).
> There are always multiple ways to address an analytical need. Thank you for
> your contributions.
> Stay well,
> Rich
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list