[R] Sum data according to date in sequence

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Fri Nov 3 16:15:59 CET 2023


Cbind is not a very good tool for adding columns to a data frame. Either use explicit column referencing like

dt1$x <- new_data_vector

but you do have to make sure the new data vector has the same number of values and in the same order as the other data in dt1. The transition from multiple records per day to one record per day would make the starting data and ending data incompatible.

library(dplyr)

dt1 <- structure(list(StationName = c("PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1"), date = c("1/14/2016", "1/14/2016",
"1/14/2016", "1/15/2016", "1/15/2016", "1/15/2016", "1/15/2016",
"1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016",
"1/16/2016", "1/16/2016", "1/17/2016", "1/17/2016", "1/17/2016",
"1/17/2016", "1/17/2016", "1/18/2016"), time = c("12:09", "19:50",
"20:22", "8:25", "14:23", "18:17", "21:46", "10:19", "12:12",
"14:12", "16:22", "19:16", "19:19", "20:24", "9:54", "12:16",
"13:53", "19:03", "22:00", "8:58"), EnergykWh = c(4.680496, 6.272414,
1.032782, 11.004884, 10.096824, 6.658797, 4.808874, 1.469384,
2.996239, 0.303222, 4.988339, 8.131804, 0.117156, 3.285669, 1.175608,
3.677487, 1.068393, 8.820755, 8.138583, 9.0575)), row.names = c(NA,
20L), class = "data.frame")

ans <- (
  dt1
  %>% mutate(
    Dt = as.Date( date, format = "%m/%d/%Y" )
  )
  %>% group_by( StationName, Dt )
  %>% summarize( DailyEnergykWh = sum( EnergykWh ) )
)


On November 3, 2023 2:51:20 AM PDT, roslinazairimah zakaria <roslinaump using gmail.com> wrote:
>Hi,
>I tried this:
># extract date from the time stamp
>dt1 <- cbind(as.Date(dt$EndDate, format="%m/%d/%Y"), dt$EnergykWh)
>head(dt1)
>colnames(dt1) <- c("date", "EnergykWh")
>and
>my dt1 becomes these, the dates are replace by numbers.
>
>dt1 <- cbind(as.Date(dt$EndDate, format="%m/%d/%Y"), dt$EnergykWh)
>dput(head(dt1))
>colnames(dt1) <- c("date", "EnergykWh")
>dput(head(dt1))
>
>
>> dput(head(dt1))structure(c(16814, 16814, 16814, 16815, 16815, 16815, 4.680496,
>6.272414, 1.032782, 11.004884, 10.096824, 6.658797), dim = c(6L,
>2L), dimnames = list(NULL, c("date", "EnergykWh")))
>
>Then I tried this:
>library(dplyr)
>dt1 %>%
>  group_by(date) %>%
>  summarise(EnergykWh.sum = sum(EnergykWh))
>and got this errors
>
>dt1 %>%+   group_by(date) %>%+   summarise(EnergykWh.sum =
>sum(EnergykWh))Error in UseMethod("group_by") :
>  no applicable method for 'group_by' applied to an object of class
>"c('matrix', 'array', 'double', 'numeric')"
>
>
>
>On Fri, Nov 3, 2023 at 7:23 AM roslinazairimah zakaria <roslinaump using gmail.com>
>wrote:
>
>> Dear all,
>>
>> I have this set of data. I would like to sum the EnergykWh according date
>> sequences.
>>
>> > head(dt1,20)                   StationName      date  time EnergykWh
>> 1  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 12:09  4.680496
>> 2  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 19:50  6.272414
>> 3  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 20:22  1.032782
>> 4  PALO ALTO CA / CAMBRIDGE #1 1/15/2016  8:25 11.004884
>> 5  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 14:23 10.096824
>> 6  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 18:17  6.658797
>> 7  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 21:46  4.808874
>> 8  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 10:19  1.469384
>> 9  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 12:12  2.996239
>> 10 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 14:12  0.303222
>> 11 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 16:22  4.988339
>> 12 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:16  8.131804
>> 13 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:19  0.117156
>> 14 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 20:24  3.285669
>> 15 PALO ALTO CA / CAMBRIDGE #1 1/17/2016  9:54  1.175608
>> 16 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 12:16  3.677487
>> 17 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 13:53  1.068393
>> 18 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 19:03  8.820755
>> 19 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 22:00  8.138583
>> 20 PALO ALTO CA / CAMBRIDGE #1 1/18/2016  8:58  9.057500
>>
>> I have tried this:
>> library(dplyr)
>> sums <- dt1 %>%
>>   group_by(date) %>%
>>   summarise(EnergykWh = sum(EnergykWh))
>>
>> head(sums,20)
>>
>> The date is not by daily sequence but by year sequence.
>>
>> > head(sums,20)# A tibble: 20 × 2
>>    date      EnergykWh
>>    <chr>         <dbl> 1 1/1/2017     25.3   2 1/1/2018     61.0   3 1/1/2019      0.627 4 1/1/2020     10.7   5 1/10/2017    69.4   6 1/10/2018    54.5   7 1/10/2019    49.1   8 1/10/2020    45.9   9 1/11/2017    73.9  10 1/11/2018    53.3  11 1/11/2019    93.5  12 1/11/2020    66.7  13 1/12/2017    78.6  14 1/12/2018    42.2  15 1/12/2019    22.7  16 1/12/2020    80.9  17 1/13/2017    85.6  18 1/13/2018    46.4  19 1/13/2019    40.0  20 1/13/2020   121.
>>
>>
>>
>> Thank you very much for any help given.
>>
>>
>> --
>> *Roslinazairimah Zakaria*
>> *Tel: +609-5492370; Fax. No.+609-5492766*
>>
>> *Email: roslinazairimah using ump.edu.my <roslinazairimah using ump.edu.my>;
>> roslinaump using gmail.com <roslinaump using gmail.com>*
>> Faculty of Industrial Sciences & Technology
>> University Malaysia Pahang
>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>>
>
>

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list