[R] Data transformation problem
phii m@iii@g oii phiiipsmith@c@
phii m@iii@g oii phiiipsmith@c@
Thu Nov 12 04:25:27 CET 2020
I am stuck on a data transformation problem. I have a data frame, df1 in
my example, with some original "levels" data. The data pertain to some
variable, such as GDP, in various reference periods, REF, as estimated
and released in various release periods, REL. The release periods follow
after the reference periods by two months or more, sometimes by several
years. I want to build a second data frame, called df2 in my example,
with the month-to-month growth rates that existed in each reference
period, revealing the revisions to those growth rates in subsequent
periods.
REF1 <-
c("2017-01-01","2017-01-01","2017-01-01","2017-01-01","2017-01-01",
"2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
"2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
REL1 <-
c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
"2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
"2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
VAL1 <-
c(17974,14567,13425,NA,12900,17974,14000,14000,12999,13245,17197,11500,
19900,18765,13467)
df1 <- data.frame(REF1,REL1,VAL1)
REF2 <-
c("2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
"2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
REL2 <-
c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
"2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
VAL2 <- c(0.0,-3.9,4.3,NA,2.3,-4.3,-17.9,42.1,44.4,1.7)
df2 <- data.frame(REF2,REL2,VAL2)
In my example I have provided some sample data pertaining to three
reference months, 2017-01-01 through 2017-03-01, and five release
periods, "2020-09-01","2020-08-01","2020-07-01","2020-06-01" and
"2019-05-01". In my actual problem I have millions of REF-REL
combinations, so my data frame is quite large. I am using data.table for
faster processing, though I am more familiar with the tidyverse. I am
providing df2 as the target data frame for my example, so you can see
what I am trying to achieve.
I have not been able to find an efficient way to do these calculations.
I have tried "for" loops with "if" statements, without success so far,
and anyway this approach would be too slow, I fear. Suggestions as to
how I might proceed would be much appreciated.
Philip
More information about the R-help
mailing list