[R] Sum data according to date in sequence
Jeff Newmiller
jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Fri Nov 3 16:15:59 CET 2023
Cbind is not a very good tool for adding columns to a data frame. Either use explicit column referencing like
dt1$x <- new_data_vector
but you do have to make sure the new data vector has the same number of values and in the same order as the other data in dt1. The transition from multiple records per day to one record per day would make the starting data and ending data incompatible.
library(dplyr)
dt1 <- structure(list(StationName = c("PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
"PALO ALTO CA / CAMBRIDGE #1"), date = c("1/14/2016", "1/14/2016",
"1/14/2016", "1/15/2016", "1/15/2016", "1/15/2016", "1/15/2016",
"1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016",
"1/16/2016", "1/16/2016", "1/17/2016", "1/17/2016", "1/17/2016",
"1/17/2016", "1/17/2016", "1/18/2016"), time = c("12:09", "19:50",
"20:22", "8:25", "14:23", "18:17", "21:46", "10:19", "12:12",
"14:12", "16:22", "19:16", "19:19", "20:24", "9:54", "12:16",
"13:53", "19:03", "22:00", "8:58"), EnergykWh = c(4.680496, 6.272414,
1.032782, 11.004884, 10.096824, 6.658797, 4.808874, 1.469384,
2.996239, 0.303222, 4.988339, 8.131804, 0.117156, 3.285669, 1.175608,
3.677487, 1.068393, 8.820755, 8.138583, 9.0575)), row.names = c(NA,
20L), class = "data.frame")
ans <- (
dt1
%>% mutate(
Dt = as.Date( date, format = "%m/%d/%Y" )
)
%>% group_by( StationName, Dt )
%>% summarize( DailyEnergykWh = sum( EnergykWh ) )
)
On November 3, 2023 2:51:20 AM PDT, roslinazairimah zakaria <roslinaump using gmail.com> wrote:
>Hi,
>I tried this:
># extract date from the time stamp
>dt1 <- cbind(as.Date(dt$EndDate, format="%m/%d/%Y"), dt$EnergykWh)
>head(dt1)
>colnames(dt1) <- c("date", "EnergykWh")
>and
>my dt1 becomes these, the dates are replace by numbers.
>
>dt1 <- cbind(as.Date(dt$EndDate, format="%m/%d/%Y"), dt$EnergykWh)
>dput(head(dt1))
>colnames(dt1) <- c("date", "EnergykWh")
>dput(head(dt1))
>
>
>> dput(head(dt1))structure(c(16814, 16814, 16814, 16815, 16815, 16815, 4.680496,
>6.272414, 1.032782, 11.004884, 10.096824, 6.658797), dim = c(6L,
>2L), dimnames = list(NULL, c("date", "EnergykWh")))
>
>Then I tried this:
>library(dplyr)
>dt1 %>%
> group_by(date) %>%
> summarise(EnergykWh.sum = sum(EnergykWh))
>and got this errors
>
>dt1 %>%+ group_by(date) %>%+ summarise(EnergykWh.sum =
>sum(EnergykWh))Error in UseMethod("group_by") :
> no applicable method for 'group_by' applied to an object of class
>"c('matrix', 'array', 'double', 'numeric')"
>
>
>
>On Fri, Nov 3, 2023 at 7:23 AM roslinazairimah zakaria <roslinaump using gmail.com>
>wrote:
>
>> Dear all,
>>
>> I have this set of data. I would like to sum the EnergykWh according date
>> sequences.
>>
>> > head(dt1,20) StationName date time EnergykWh
>> 1 PALO ALTO CA / CAMBRIDGE #1 1/14/2016 12:09 4.680496
>> 2 PALO ALTO CA / CAMBRIDGE #1 1/14/2016 19:50 6.272414
>> 3 PALO ALTO CA / CAMBRIDGE #1 1/14/2016 20:22 1.032782
>> 4 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 8:25 11.004884
>> 5 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 14:23 10.096824
>> 6 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 18:17 6.658797
>> 7 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 21:46 4.808874
>> 8 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 10:19 1.469384
>> 9 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 12:12 2.996239
>> 10 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 14:12 0.303222
>> 11 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 16:22 4.988339
>> 12 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:16 8.131804
>> 13 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:19 0.117156
>> 14 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 20:24 3.285669
>> 15 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 9:54 1.175608
>> 16 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 12:16 3.677487
>> 17 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 13:53 1.068393
>> 18 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 19:03 8.820755
>> 19 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 22:00 8.138583
>> 20 PALO ALTO CA / CAMBRIDGE #1 1/18/2016 8:58 9.057500
>>
>> I have tried this:
>> library(dplyr)
>> sums <- dt1 %>%
>> group_by(date) %>%
>> summarise(EnergykWh = sum(EnergykWh))
>>
>> head(sums,20)
>>
>> The date is not by daily sequence but by year sequence.
>>
>> > head(sums,20)# A tibble: 20 × 2
>> date EnergykWh
>> <chr> <dbl> 1 1/1/2017 25.3 2 1/1/2018 61.0 3 1/1/2019 0.627 4 1/1/2020 10.7 5 1/10/2017 69.4 6 1/10/2018 54.5 7 1/10/2019 49.1 8 1/10/2020 45.9 9 1/11/2017 73.9 10 1/11/2018 53.3 11 1/11/2019 93.5 12 1/11/2020 66.7 13 1/12/2017 78.6 14 1/12/2018 42.2 15 1/12/2019 22.7 16 1/12/2020 80.9 17 1/13/2017 85.6 18 1/13/2018 46.4 19 1/13/2019 40.0 20 1/13/2020 121.
>>
>>
>>
>> Thank you very much for any help given.
>>
>>
>> --
>> *Roslinazairimah Zakaria*
>> *Tel: +609-5492370; Fax. No.+609-5492766*
>>
>> *Email: roslinazairimah using ump.edu.my <roslinazairimah using ump.edu.my>;
>> roslinaump using gmail.com <roslinaump using gmail.com>*
>> Faculty of Industrial Sciences & Technology
>> University Malaysia Pahang
>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>>
>
>
--
Sent from my phone. Please excuse my brevity.
More information about the R-help
mailing list