[R] Data transformation problem
phii m@iii@g oii phiiipsmith@c@
phii m@iii@g oii phiiipsmith@c@
Thu Nov 12 13:23:34 CET 2020
Thank you so much for this elegant solution, Jeff.
Philip
On 2020-11-12 02:20, Jeff Newmiller wrote:
> I am not a data.table afficiando, but here is how I would do it with
> dplyr/tidyr:
>
> library(dplyr)
> library(tidyr)
>
> do_per_REL <- function( DF ) {
> rng <- range( DF$REF1 ) # watch out for missing months?
> DF <- ( data.frame( REF1 = seq( rng[ 1 ], rng[ 2 ], by = "month" )
> )
> %>% left_join( DF, by = "REF1" )
> %>% arrange( REF1 )
> )
> with( DF
> , data.frame( REF2 = REF1[ -1 ]
> , VAL2 = 100 * diff( VAL1 ) / VAL1[ -length( VAL1 ) ]
> )
> )
> }
>
> df2a <- ( df1
> %>% mutate( REF1 = as.Date( REF1 )
> , REL1 = as.Date( REL1 )
> )
> %>% nest( data = -REL1 )
> %>% rename( REL2 = REL1 )
> %>% rowwise()
> %>% mutate( data = list( do_per_REL( data ) ) )
> %>% ungroup()
> %>% unnest( cols = "data" )
> %>% select( REF2, REL2, VAL2 )
> %>% arrange( REF2, desc( REL2 ), VAL2 )
> )
> df2a
>
> On Wed, 11 Nov 2020, phil using philipsmith.ca wrote:
>
>> I am stuck on a data transformation problem. I have a data frame, df1
>> in my example, with some original "levels" data. The data pertain to
>> some variable, such as GDP, in various reference periods, REF, as
>> estimated and released in various release periods, REL. The release
>> periods follow after the reference periods by two months or more,
>> sometimes by several years. I want to build a second data frame,
>> called df2 in my example, with the month-to-month growth rates that
>> existed in each reference period, revealing the revisions to those
>> growth rates in subsequent periods.
>>
>> REF1 <-
>> c("2017-01-01","2017-01-01","2017-01-01","2017-01-01","2017-01-01",
>> "2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
>> "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
>> REL1 <-
>> c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
>> "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
>> "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
>> VAL1 <-
>> c(17974,14567,13425,NA,12900,17974,14000,14000,12999,13245,17197,11500,
>> 19900,18765,13467)
>> df1 <- data.frame(REF1,REL1,VAL1)
>> REF2 <-
>> c("2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
>> "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
>> REL2 <-
>> c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
>> "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
>> VAL2 <- c(0.0,-3.9,4.3,NA,2.3,-4.3,-17.9,42.1,44.4,1.7)
>> df2 <- data.frame(REF2,REL2,VAL2)
>>
>> In my example I have provided some sample data pertaining to three
>> reference months, 2017-01-01 through 2017-03-01, and five release
>> periods, "2020-09-01","2020-08-01","2020-07-01","2020-06-01" and
>> "2019-05-01". In my actual problem I have millions of REF-REL
>> combinations, so my data frame is quite large. I am using data.table
>> for faster processing, though I am more familiar with the tidyverse. I
>> am providing df2 as the target data frame for my example, so you can
>> see what I am trying to achieve.
>>
>> I have not been able to find an efficient way to do these
>> calculations. I have tried "for" loops with "if" statements, without
>> success so far, and anyway this approach would be too slow, I fear.
>> Suggestions as to how I might proceed would be much appreciated.
>>
>> Philip
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller The ..... ..... Go
> Live...
> DCN:<jdnewmil using dcn.davis.ca.us> Basics: ##.#. ##.#. Live
> Go...
> Live: OO#.. Dead: OO#..
> Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#.
> rocks...1k
> ---------------------------------------------------------------------------
More information about the R-help
mailing list