[R] Duplicate names in the pivot column
Jeff Newmiller
jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Sun Mar 29 09:24:38 CEST 2020
Does this help?
df4 <- ( df
%>% group_by( time, y )
%>% mutate( lvl = seq.int( n() ) )
%>% ungroup()
%>% mutate( y = ifelse( 1==lvl
, y
, paste( y, "dup" )
)
)
)
On March 28, 2020 6:18:51 PM PDT, phil using philipsmith.ca wrote:
>I have a problem involving inefficient coding. My code works, but in my
>
>actual application it takes a very long time to execute. I have
>included
>a reprex here that uses the same code, but with a much smaller-scale
>application.
>
>The data frame I am working with (df in my reprex) is in long form and
>I
>want to change it to wide form. My problem is that the pivot column,
>column 2 in my reprex, has some duplicate strings, so the pivot doesn't
>
>work well (df1 in my reprex). I want to find all the duplicates and tag
>
>them so they are no longer duplicates. My code succeeds (df3 in my
>reprex). But in the real application there can be over 100 "cases" and
>the for loops grind on far too long.
>
>I encounter this problem frequently in the datasets I use, so I am
>looking for a general solution that is as efficient as possible. Any
>help will be much appreciated.
>
>Philip
>
>``` r
>library(tidyverse)
>df <- data.frame(time=c(1,1,1,1,1,1,2,2,2,2,2,2),
> y=c("A","B","C","B","D","C","A","B","C","B","D","C"),
> z=sample(1:100,12,replace=TRUE),stringsAsFactors=FALSE)
>df1 <- pivot_wider(df,id_cols=1,names_from=y,values_from=z)
>#> Warning: Values in `z` are not uniquely identified; output will
>contain list-cols.
>#> * Use `values_fn = list(z = list)` to suppress this warning.
>#> * Use `values_fn = list(z = length)` to identify where the
>duplicates
>arise
>#> * Use `values_fn = list(z = summary_fun)` to summarise duplicates
>fixcol <- function(dfm,cases,per,s,tag) {
> # dfm is the data frame
> # s is the target column number, containing character names
> # tag is a string to be added to a duplicate name
> # cases is the number of rows for a single time period
> # per is the number of time periods
> # all time periods must have the same number of rows
> for (k in 1:per) {
> for (i in (1+(k-1)*cases):(k*cases-1)) {
> for (j in (i+1):(k*cases)) {
> if (dfm[j,s]==dfm[i,s]) { # found a duplicate
> dfm[j,s] <- paste0(dfm[i,s],tag) # fix the duplicate
> dfm[j,s]
> }
> }
> }
> }
> return(dfm)
>}
>df2 <- fixcol(df,6,2,2,"_dup")
>df3 <- pivot_wider(df2,id_cols=1,names_from=y,values_from=z)
>```
>
><sup>Created on 2020-03-28 by the [reprex
>package](https://reprex.tidyverse.org) (v0.3.0)</sup>
--
Sent from my phone. Please excuse my brevity.
More information about the R-help
mailing list