[R] Help with transpose please.

Sat Jun 23 12:04:26 CEST 2018

On 6/22/2018 4:43 AM, Bill Poling wrote:
> Good morning.
> 
> 
> I have data in the form:
> 
> head(Edit041IA, n=25)
>     ClaimServiceID  ClaimID DiagnosisCode
> 1       183056004 78044473          C562
> 2       183056004 78044473          C778
> 3       183056004 78044473          C784
> 4       183056004 78044473          C786
> 5       183056004 78044473         C7961
> 6       183056004 78044473         C7982
> 7       183056004 78044473         C7989
> 8       183056008 78044473          C562
> 9       183056008 78044473          C778
> 10      183056008 78044473          C784
> 11      183056008 78044473          C786
> 12      183056008 78044473         C7961
> 13      183056008 78044473         C7982
> 14      183056008 78044473         C7989
> 15      183139945 78078925        M79606
> 16      183139945 78078925         M7989
> 17      183139945 78078925          R600
> 18      183236728 78119632        H02831
> 19      183236728 78119632        H02832
> 20      183236728 78119632        H02834
> 21      183236728 78119632        H02835
> 22      183236728 78119632        H04123
> 23      183236728 78119632          Z411
> 24      183236728 78119632         H2513
> 25      183236728 78119632        H43813
> 
> And wish to transpose to single record for single claimServiceID, ClaimID, and Dx1,Dx2,Dx3, etc:
> 
> There can be multiple claimServiceIDs for a ClaimID so I want the unique ClaimServiceID as the identifier when I join this data back into a longer single record length file by that column.
> 
>      claimServiceID ClaimID  Dx1   Dx2    Dx3  ...etc
> 1 183056004    78044473 C562 C778 C784 C786 C7961 ...etc
> 2 183056008    78044473 C562 C778 C784 C786 C7961 ...etc
> 
> 
> (If you would prefer the complete  dput of the 1272 records I will gladly provide .Names = c("ClaimServiceID",
> 
> "ClaimID", "DiagnosisCode"), class = "data.frame", row.names = c(NA,
> 
> -1272L))
> 
> 
> 
> At the moment the classes are:
> 
> classes <- as.character(sapply(Edit041IA, class))
> 
> classes
> 
> # [1] "integer"   "integer"   "character" <---but do not have to be if that helps its just that's how the csv load pulled them in
> 
> The max number of columns based on this transpose of the DiagnosisCode column (in this dataset) is 12 if that is important to know.
> 
> I have looked at a variety of webpages and cannot get this right,
> 
> dta2 <- melt(dta1, id=c("ClaimServiceID","ClaimID"))
> View(dta2)
>   # https://www.r-bloggers.com/pivot-tables-in-r/
> 
> # https://stackoverflow.com/questions/18449938/pivot-on-data-table-similar-to-rehape-melt-function
> 
> 
> dta3 <- cast(Edit041IA, ClaimServiceID ~ DiagnosisCode, ClaimID)
> View(dta3)
> dta3 <- cast(Edit041IA, DiagnosisCode ~ ClaimServiceID, ClaimID)
> View(dta3)
> 
> dta3 <- melt(Edit041IA, id=c("ClaimServiceID"))
> View(dta3)
> 
> dta3 <- aggregate(Edit041IA, by=list(ClaimServiceID, ClaimID, DiagnosisCode))
> View(dta3)
> 
> 
> dta3 <- aggregate(Edit041IA, by=list(ClaimServiceID))
> View(dta3)
>   # https://www.r-statistics.com/tag/transpose/
> 
>   dta3 <- aggregate(Edit041IA, by=list(DiagnosisCode, ClaimServiceID, ClaimID))
> View(dta3)
> 
> 
> I am sure it's a basic,  simple procedure, but I am pressed for time on this one, any support will be greatly appreciated, thank you.
> 
> WHP
> 
> 
> 
> 
> 
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}}
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

Bill,

you have received some good suggestions and since you are pressed for 
time this may be too late.  However, here is a solution using ave() 
function and  cast() from the reshape package.

# create diagnosis variable names
dxnames <- paste('Dx',ave(rep(1, nrow(have)), have[,1:2], FUN = 
seq_along), sep='')
# cast the data into wide format
cast(cbind(have,dxnames), ClaimServiceID + ClaimID ~ dxnames, 
value='DiagnosisCode')

Hope this is helpful,

Dan

-- 
Daniel Nordlund
Port Townsend, WA  USA