[R] Correctly applying aggregate.ts()

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Sat Sep 8 00:34:30 CEST 2018


Clarification: When using the formula interface, no subscripting is needed.

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Sep 7, 2018 at 3:25 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:
>
> Well, let's see:
> "monthly.rain <- aggregate.ts(x = dp['sampdate','prcp'], by = list(month = \
> substr(dp$sampdate, 1, 7)), FUN = sum, na.rm = TRUE)"
>
> 1. x is a data frame, so why are you using the time series method?
> Perhaps you need to study S3 method usage in R.
>
> 2. You have improperly subscripted the data frame: it should be dp[,
> c('sampdate','prcp')] . Perhaps you need to read about how
> subscripting in R. However, in this case, no subscripting is needed
> (see 3.)
>
> 3. As you should be using the data frame method, and the month is
> obtained as a substring of sampdate, you should use dp[,'prcp'] as
> your data frame so that sum() is not applied to the sampdate column.
>
> 4. I assume the "\" indicates <Return> ?
>
> Anyway, once you have corrected all that, here's the call:
>
> > monthly.rain <- aggregate(dp[, 'prcp'],
> +                           list(substr(dp$sampdate,1,7)),
> +                           FUN = sum, na.rm = TRUE)
> > ## yielding
> > monthly.rain
>   Group.1    x
> 1 2005-01 4.88
> 2 2005-02 2.27
> 3 2005-03 0.06
>
> It's perhaps also worth noting that the formula method (for data
> frames) is somewhat more convenient, especially with several grouping
> factors in the list:
>
> > monthly.rain <- aggregate(prcp ~ substr(sampdate,1,7), data = dp, FUN = sum, na.rm = TRUE)
> > ##yielding
> > monthly.rain
>   substr(sampdate, 1, 7) prcp
> 1                2005-01 4.88
> 2                2005-02 2.27
> 3                2005-03 0.06
>
> Cheers,
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> On Fri, Sep 7, 2018 at 2:19 PM Rich Shepard <rshepard using appl-ecosys.com> wrote:
> >
> >    I've read ?aggregate and several blog posts on using aggregate() yet I
> > still haven't applied it correctly to my dataframe. The sample data are:
> >
> > structure(list(sampdate = c("2005-01-01", "2005-01-02", "2005-01-03",
> > "2005-01-04", "2005-01-05", "2005-01-06", "2005-01-07", "2005-01-08",
> > "2005-01-09", "2005-01-10", "2005-01-11", "2005-01-12", "2005-01-13",
> > "2005-01-14", "2005-01-15", "2005-01-16", "2005-01-17", "2005-01-18",
> > "2005-01-19", "2005-01-20", "2005-01-21", "2005-01-22", "2005-01-23",
> > "2005-01-24", "2005-01-25", "2005-01-26", "2005-01-27", "2005-01-28",
> > "2005-01-29", "2005-01-30", "2005-01-31", "2005-02-01", "2005-02-02",
> > "2005-02-03", "2005-02-04", "2005-02-05", "2005-02-06", "2005-02-07",
> > "2005-02-08", "2005-02-09", "2005-02-10", "2005-02-11", "2005-02-12",
> > "2005-02-13", "2005-02-14", "2005-02-15", "2005-02-16", "2005-02-17",
> > "2005-02-18", "2005-02-19", "2005-02-20", "2005-02-21", "2005-02-22",
> > "2005-02-23", "2005-02-24", "2005-02-25", "2005-02-26", "2005-02-27",
> > "2005-02-28", "2005-03-01", "2005-03-02", "2005-03-03"), prcp = c(0.59,
> > 0.08, 0.1, 0, 0, 0.02, 0.05, 0.1, 0, 0.02, 0, 0.05, 0.2, 0, 0,
> > 0.5, 0.41, 0.84, 0.01, 0.1, 0.01, 0, 0, 0, 0, 0.21, 0.24, 0.13,
> > 1.12, 0.01, 0.09, 0, 0, 0, 0.35, 0.18, 0.65, 0.16, 0, 0, 0, 0,
> > 0.55, 0.21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.17, 0.05,
> > 0.01, 0)), row.names = c(NA, 62L), class = "data.frame")
> >
> >    What I need to learn how to do is to calculate monthly sum, median, and
> > maximum rainfall amounts from the full data set which has daily rainfall
> > amounts. My most current effort to calculate monthly sums uses this syntax:
> >
> > monthly.rain <- aggregate.ts(x = dp['sampdate','prcp'], by = list(month = \
> > substr(dp$sampdate, 1, 7)), FUN = sum, na.rm = TRUE)
> >
> > (entered on a single line) which produces this result:
> >
> > head(monthly.rain)
> > [1] NA
> >
> >    The sample data has 62 of the 113K rows in the dataframe. A larger set can
> > be provided if needed.
> >
> >    An explanation of what I've missed is needed.
> >
> > Regards,
> >
> > Rich
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list