[R] How to select max data according to week?
Eric Berger
er|cjberger @end|ng |rom gm@||@com
Thu Jun 20 09:21:14 CEST 2019
Marc - very nice.
On Wed, Jun 19, 2019 at 6:30 PM Marc Schwartz <marc_schwartz using me.com> wrote:
> Hi All,
>
> Is there a reason that the following was not considered?
>
> > DF
> Date O3_Conc
> 1 2000-01-01 0.033
> 2 2000-01-02 0.023
> 3 2000-01-03 0.025
> 4 2000-01-04 0.041
> 5 2000-01-05 0.063
> 6 2000-01-06 0.028
> 7 2000-01-07 0.068
> 8 2000-01-08 0.048
> 9 2000-01-09 0.037
> 10 2000-01-10 0.042
> 11 2000-01-11 0.027
> 12 2000-01-12 0.035
> 13 2000-01-13 0.063
> 14 2000-01-14 0.035
> 15 2000-01-15 0.042
> 16 2000-01-16 0.028
>
> > str(DF)
> 'data.frame': 16 obs. of 2 variables:
> $ Date : Date, format: "2000-01-01" ...
> $ O3_Conc: num 0.033 0.023 0.025 0.041 0.063 0.028 0.068 0.048 0.037
> 0.042 ...
>
>
> > aggregate(O3_Conc ~ cut(Date, breaks = "14 days"), data = DF, max)
> cut(Date, breaks = "14 days") O3_Conc
> 1 2000-01-01 0.068
> 2 2000-01-15 0.042
>
>
> See ?cut.Date
>
> Note that there is an error in Aisyah's original post, in that the value
> for the max concentration in the second 14 day window is 0.042, not 0.061.
>
> Also, if one wanted to use actual calendar weeks, with a week starting on
> Sunday, as opposed to 14 day windows starting from the minimum date, you
> could use:
>
> > aggregate(O3_Conc ~ cut(Date, breaks = "2 weeks", start.on.monday =
> FALSE), data = DF, max)
> cut(Date, breaks = "2 weeks", start.on.monday = FALSE) O3_Conc
> 1 1999-12-26 0.068
> 2 2000-01-09 0.063
>
>
> If desired, you could pre-create a column in the DF with the result of
> cut():
>
> DF$TimeStartDays <- cut(DF$Date, breaks = "14 days")
> DF$TimeStartWks <- cut(DF$Date, breaks = "2 weeks", start.on.monday =
> FALSE)
>
> > DF
> Date O3_Conc TimeStartDays TimeStartWks
> 1 2000-01-01 0.033 2000-01-01 1999-12-26
> 2 2000-01-02 0.023 2000-01-01 1999-12-26
> 3 2000-01-03 0.025 2000-01-01 1999-12-26
> 4 2000-01-04 0.041 2000-01-01 1999-12-26
> 5 2000-01-05 0.063 2000-01-01 1999-12-26
> 6 2000-01-06 0.028 2000-01-01 1999-12-26
> 7 2000-01-07 0.068 2000-01-01 1999-12-26
> 8 2000-01-08 0.048 2000-01-01 1999-12-26
> 9 2000-01-09 0.037 2000-01-01 2000-01-09
> 10 2000-01-10 0.042 2000-01-01 2000-01-09
> 11 2000-01-11 0.027 2000-01-01 2000-01-09
> 12 2000-01-12 0.035 2000-01-01 2000-01-09
> 13 2000-01-13 0.063 2000-01-01 2000-01-09
> 14 2000-01-14 0.035 2000-01-01 2000-01-09
> 15 2000-01-15 0.042 2000-01-15 2000-01-09
> 16 2000-01-16 0.028 2000-01-15 2000-01-09
>
> and then use:
>
> > aggregate(O3_Conc ~ TimeStartDays, data = DF, max)
> TimeStartDays O3_Conc
> 1 2000-01-01 0.068
> 2 2000-01-15 0.042
>
> > aggregate(O3_Conc ~ TimeStartWks, data = DF, max)
> TimeStartWks O3_Conc
> 1 1999-12-26 0.068
> 2 2000-01-09 0.063
>
>
> Regards,
>
> Marc Schwartz
>
>
>
> > On Jun 19, 2019, at 10:45 AM, Eric Berger <ericjberger using gmail.com> wrote:
> >
> > I just realized that my method as written only works if the entire date
> > range is within a single calendar year.
> > Here is a revised version that should handle the general case.
> >
> > In this version I assume that the conversion from the original numbers to
> > dates has already been done
> > e.g. it can be done using ymd() as I showed before
> >
> > library(dplyr)
> > library(lubridate)
> >
> > ## create some example data for 36 months from 2010
> > d <- seq(from=as.Date("2010-01-01"),to=as.Date("2012-12-31"),by="1 day")
> > conc <- runif(length(d)) # concentrations
> > df <- data.frame( dt=d, conc=conc)
> > df2 <- dplyr::mutate( df, wkpair=as.integer(floor((0:(nrow(df)-1))/14)) )
> > df3 <- dplyr::group_by(df2,wkpair) %>%
> > dplyr::summarise(from=min(dt), to=max(dt), maxconc=max(conc)) %>%
> > dplyr::select(from,to,maxconc)
> > df3
> >
> > # A tibble: 79 x 3
> > from to maxconc
> > <date> <date> <dbl>
> > 1 2010-01-01 2010-01-14 0.928
> > 2 2010-01-15 2010-01-28 0.834
> > 3 2010-01-29 2010-02-11 0.989
> > 4 2010-02-12 2010-02-25 0.983
> > 5 2010-02-26 2010-03-11 0.898
> > 6 2010-03-12 2010-03-25 0.864
> > 7 2010-03-26 2010-04-08 0.920
> > 8 2010-04-09 2010-04-22 0.925
> > 9 2010-04-23 2010-05-06 0.998
> > 10 2010-05-07 2010-05-20 0.761
> > # ... with 69 more rows
> >
> > On Wed, Jun 19, 2019 at 5:08 PM Eric Berger <ericjberger using gmail.com>
> wrote:
> >
> >> Hi Bert,
> >> I agree that our interpretation is slightly different, which is why I
> >> wrote:
> >> "If one can take the actual week of the year as an acceptable definition
> >> of week, then here's my approach."
> >>
> >> Regards,
> >> Eric
> >>
> >> On Wed, Jun 19, 2019 at 5:04 PM Bert Gunter <bgunter.4567 using gmail.com>
> >> wrote:
> >>
> >>> Eric:
> >>>
> >>> I believe you're doing something different than I did. I broke up each
> >>> month into biweekly periods, 2+ per month. You seem to be grouping the
> >>> overall entire period into biweekly intervals -- apologies if I'm
> wrong,
> >>> but if I understood correctly, that's not the same thing. I do not know
> >>> which of us -- if either -- has interpreted her query correctly.
> >>>
> >>> Cheers,
> >>> Bert
> >>>
> >>> On Wed, Jun 19, 2019 at 2:35 AM Eric Berger <ericjberger using gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Siti,
> >>>> I didn't test Bert's code but I assume it's fine. :-)
> >>>> I would take a different approach than Bert. I prefer to use a package
> >>>> such as lubridate to handle the date wrangling, and a package such as
> dplyr
> >>>> to handle the grouping and max extraction.
> >>>> It may be overkill for this problem, but these are great packages to
> >>>> become familiar with.
> >>>> If one can take the actual week of the year as an acceptable
> definition
> >>>> of week, then here's my approach.
> >>>>
> >>>> library(lubridate)
> >>>> library(dplyr)
> >>>>
> >>>> # Step 1: start with Bert's code to create sample data
> >>>> ## create some example data for 3 months in 2000
> >>>> d<- 2e7 +c(113:131,201:228, 301:330) ## dates
> >>>> conc <- runif(length(d)) # concentrations
> >>>>
> >>>> # Step 2: collect the data into a data frame
> >>>> df <- data.frame( dt=d, conc=conc)
> >>>>
> >>>> # Step 3: use lubridate's ymd() function to parse the dates, its
> week()
> >>>> function to identify the week of the year, and define the new column
> >>>> 'wkpair' that groups the weeks 2-at-a-time
> >>>> df2 <- dplyr::mutate( df,
> >>>> wkpair=as.integer(floor(lubridate::week(lubridate::ymd(dt) )/2)) )
> >>>>
> >>>> # Step 4: group by the wkpair and use dplyr's summarise to get the
> info
> >>>> you wanted
> >>>> df3 <- dplyr::group_by(df2,wkpair) %>%
> >>>> dplyr::summarise( from=min(dt), to=max(dt),
> maxconc=max(conc))
> >>>> %>%
> >>>> dplyr::select(from,to,maxconc)
> >>>>
> >>>> df3
> >>>>
> >>>> # A tibble: 6 x 3
> >>>> from to maxconc
> >>>> <dbl> <dbl> <dbl>
> >>>> 1 20000113 20000121 0.963
> >>>> 2 20000122 20000204 0.988
> >>>> 3 20000205 20000218 0.939
> >>>> 4 20000219 20000303 0.883
> >>>> 5 20000304 20000317 0.863
> >>>> 6 20000318 20000330 0.765
> >>>>
> >>>> HTH,
> >>>> Eric
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Jun 18, 2019 at 9:39 PM Bert Gunter <bgunter.4567 using gmail.com>
> >>>> wrote:
> >>>>
> >>>>> My apologies. I negected to cc r-help. -- Bert
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Jun 18, 2019 at 11:21 AM Bert Gunter <bgunter.4567 using gmail.com
> >
> >>>>> wrote:
> >>>>>
> >>>>>>
> >>>>>> I assume that 20000215 means year 2000, month 2, day 15.
> >>>>>> I also assume that you want maxes for the first 2 weeks of a month,
> >>>>> the
> >>>>>> second 2 weeks, and any remaining days.
> >>>>>> I also assume that this might be desired for arbitrary numbers of
> >>>>> years,
> >>>>>> months, and days.
> >>>>>>
> >>>>>> The following is one way to do this. As it's a slight pain to cut
> and
> >>>>>> paste email data as text into R (use ?dput or R code to run to
> provide
> >>>>>> example data instead), I just made up my own. You'll have to do the
> >>>>>> following within a data frame through extraction or by using with()
> of
> >>>>>> course.
> >>>>>>
> >>>>>> ## create some example data for 3 months in 2000
> >>>>>> d<- 2e7 +c(113:131,201:228, 301:330) ## dates
> >>>>>> conc <- runif(length(d)) # concentrations
> >>>>>>
> >>>>>> ## convert the date to character to extract year, month, and day
> >>>>>> cdate <- as.character(d)
> >>>>>> ## use substr to to the extraction
> >>>>>> year <- substr(cdate,1,4)
> >>>>>> mon <- substr(cdate,5,6)
> >>>>>> day <- substr(cdate, 7,8)
> >>>>>>
> >>>>>> ## convert day to numeric and use cut() to group into the biweekly
> >>>>> periods.
> >>>>>> d14 <- cut(as.numeric(day), c(0,14.5,28.5, 32))
> >>>>>>
> >>>>>> ## Use tapply() to create your desired table of results.
> >>>>>> tapply(conc, list(year, d14, mon), max, na.rm = TRUE)
> >>>>>>
> >>>>>> ## Results
> >>>>>>
> >>>>>> , , 01
> >>>>>>
> >>>>>> (0,14.5] (14.5,28.5] (28.5,32]
> >>>>>> 2000 0.7357389 0.9655391 0.7962965
> >>>>>>
> >>>>>> , , 02
> >>>>>>
> >>>>>> (0,14.5] (14.5,28.5] (28.5,32]
> >>>>>> 2000 0.8193979 0.9487207 NA
> >>>>>>
> >>>>>> , , 03
> >>>>>>
> >>>>>> (0,14.5] (14.5,28.5] (28.5,32]
> >>>>>> 2000 0.9718919 0.9997093 0.168659
> >>>>>>
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Bert
> >>>>>>
> >>>>>> Bert Gunter
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Jun 18, 2019 at 8:53 AM SITI AISYAH BINTI ZAKARIA <
> >>>>>> aisyahzakaria using unimap.edu.my> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I'm Aisyah..I have a problem to run my R coding. I want to select
> >>>>> maximum
> >>>>>>> value according to week.
> >>>>>>>
> >>>>>>> here is my data
> >>>>>>>
> >>>>>>> Date O3_Conc
> >>>>>>> 20000101 0.033
> >>>>>>> 20000102 0.023
> >>>>>>> 20000103 0.025
> >>>>>>> 20000104 0.041
> >>>>>>> 20000105 0.063
> >>>>>>> 20000106 0.028
> >>>>>>> 20000107 0.068
> >>>>>>> 20000108 0.048
> >>>>>>> 20000109 0.037
> >>>>>>> 20000110 0.042
> >>>>>>> 20000111 0.027
> >>>>>>> 20000112 0.035
> >>>>>>> 20000113 0.063
> >>>>>>> 20000114 0.035
> >>>>>>> 20000115 0.042
> >>>>>>> 20000116 0.028
> >>>>>>>
> >>>>>>> I want to find the max value from column O3_Conc for only 14 days
> >>>>> that
> >>>>>>> refer to biweekly in month. And the next 14 days for the max value.
> >>>>>>>
> >>>>>>> I hope that I can get the result like this:
> >>>>>>>
> >>>>>>> Date Max O3_Conc
> >>>>>>> 20000101 - 20000114 0.068
> >>>>>>> 20000115 - 20000129 0.061
> >>>>>>>
> >>>>>>> I try many coding but still unavailable.
> >>>>>>>
> >>>>>>> this example my coding
> >>>>>>>
> >>>>>>> library(plyr)
> >>>>>>> data.frame(CA0003)
> >>>>>>>
> >>>>>>> # format weeks as per requirement (replace "00" with "52" and
> >>>>>>> adjust corresponding year)
> >>>>>>> tmp <- list()
> >>>>>>> tmp$y <- format(df$Date, format="%Y")
> >>>>>>> tmp$w <- format(df$Date, format="%U")
> >>>>>>> tmp$y[tmp$w=="00"] <-
> >>>>> as.character(as.numeric(tmp$y[tmp$w=="00"]) -
> >>>>>>> 14)
> >>>>>>> tmp$w[tmp$w=="00"] <- "884"
> >>>>>>> df$week <- paste(tmp$y, tmp$w, sep = "-")
> >>>>>>>
> >>>>>>> # get summary
> >>>>>>> df2 <- ddply(df, .(week),transform, O3_Conc=max(O3_Conc))
> >>>>>>>
> >>>>>>> # include week ending date
> >>>>>>> tmp$week.ending <- lapply(df2$week, function(x) rev(df[df$week
> >>>>> ==x,
> >>>>>>> "O3_Conc"])[[1]])
> >>>>>>> df2$week.ending <- sapply(tmp$week.ending, max(O3_Conc, TRUE)
> >>>>>>>
> >>>>>>> output
> >>>>>>> Site_Id Site_Location
> >>>>> Date
> >>>>>>> Year O3_Conc Month Day week
> >>>>>>> 1 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000101
> >>>>>>> 2000 0.033 1 1 NULL-NULL
> >>>>>>> 2 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000102
> >>>>>>> 2000 0.023 1 2 NULL-NULL
> >>>>>>> 3 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000103
> >>>>>>> 2000 0.025 1 3 NULL-NULL
> >>>>>>> 4 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000104
> >>>>>>> 2000 0.041 1 4 NULL-NULL
> >>>>>>> 5 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000105
> >>>>>>> 2000 0.063 1 5 NULL-NULL
> >>>>>>> 6 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000106
> >>>>>>> 2000 0.028 1 6 NULL-NULL
> >>>>>>> 7 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000107
> >>>>>>> 2000 0.068 1 7 NULL-NULL
> >>>>>>> 8 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000108
> >>>>>>> 2000 0.048 1 8 NULL-NULL
> >>>>>>> 9 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000109
> >>>>>>> 2000 0.037 1 9 NULL-NULL
> >>>>>>> 10 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000110
> >>>>>>> 2000 0.042 1 10 NULL-NULL
> >>>>>>> 11 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000111
> >>>>>>> 2000 0.027 1 11 NULL-NULL
> >>>>>>>
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list