[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
Ebert,Timothy Aaron
tebert @end|ng |rom u||@edu
Wed Nov 27 18:37:01 CET 2024
Very similar to what Oliver posted:
newdata <- olddata |>
group_by(ID) |>
mutate(firstdate = first(date))
1) I attached dplyr to the entire program. Oliver used dplyr::group_by() and dplyr::mutate() to do the same thing.
2) I used the base R |> pipe while Oliver used the %>% pipe from the magritter package to do the same thing.
If you want a version that is closer to how SAS would process the data, then you could use for loops after sorting the data.
-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Tom Woolman
Sent: Wednesday, November 27, 2024 12:05 PM
To: Sorkin, John <jsorkin using som.umaryland.edu>
Cc: r-help using r-project.org (r-help using r-project.org) <r-help using r-project.org>
Subject: Re: [R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
[External Email]
Check out the dplyr package, specifically the mutate function.
# Create new column based on existing column value
df <- df %>% mutate(FirstDay = if(ID = 2, 5))
Repeat as needed to capture all of the day/firstday combinations you want to account for.
Like everything else in R, there are probably at least a dozen other ways to do this, between base R and all of the library packages available.
On Wednesday, November 27th, 2024 at 11:30 AM, Sorkin, John <jsorkin using som.umaryland.edu> wrote:
> I am an old, long time SAS programmer. I need to produce R code that processes a dataframe in a manner that is equivalent to that produced by using a by statement in SAS and an if first.day statement and a retain statement:
> I want to take data (olddata) that looks like this ID Day
> 1 1
> 1 1
> 1 2
> 1 2
> 1 3
> 1 3
> 1 4
> 1 4
> 1 5
> 1 5
> 2 5
> 2 5
> 2 5
> 2 6
> 2 6
> 2 6
> 3 10
> 3 10
> and make it look like this:
> (withing each ID I am copying the first value of Day into a new variable, FirstDay, and propagating the FirstDay value through all rows that have the same ID:
> ID Day FirstDay
> 1 1 1
> 1 1 1
> 1 2 1
> 1 2 1
> 1 3 1
> 1 3 1
> 1 4 1
> 1 4 1
> 1 5 1
> 1 5 1
> 2 5 5
> 2 5 5
> 2 5 5
> 2 6 5
> 2 6 5
> 2 6 5
> 3 10 3
> 3 10 3
> SAS code that can do this is:
> proc sort data=olddata;
> by ID Day;
> run;
> data newdata;
> retain FirstDay;
> set olddata;
> by ID;
> if first.ID then FirstDay=Day;
> run;
> I have NO idea how to do this is R (so I can't post test-code), but below I have R code that creates olddata:
> ID <- c(rep(1,10),rep(2,6),rep(3,2))
> date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2),
> rep(5,3),rep(6,3),rep(10,2))
> date
> olddata <- data.frame(ID=ID,date=date) olddata
> Any suggestions on how to do this would be appreciated. . . I have worked on this for more than 12-hours, despite multiple we searches I have gotten nowhere. . .
> Thanks
> John
> John David Sorkin M.D., Ph.D.
> Professor of Medicine, University of Maryland School of Medicine;
> Associate Director for Biostatistics and Informatics, Baltimore VA
> Medical Center Geriatrics Research, Education, and Clinical Center; PI
> Biostatistics and Informatics Core, University of Maryland School of
> Medicine Claude D. Pepper Older Americans Independence Center; Senior
> Statistician University of Maryland Center for Vascular Research;
> Division of Gerontology and Paliative Care,
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> Cell phone 443-418-5382
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat/
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C02%7Ctebert%40ufl.edu
> %7Cd2ffd4065fbb410d5c0008dd0f05b081%7C0d4da0f84a314d76ace60a62331e1b84
> %7C0%7C0%7C638683239328228378%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGki
> %3D%3D%7C0%7C%7C%7C&sdata=MvED5XRiFxLMfQsagl1K8IoadbM7lxMPLWm9ord6Oac%
> 3D&reserved=0 PLEASE do read the posting guide
> https://www/.
> r-project.org%2Fposting-guide.html&data=05%7C02%7Ctebert%40ufl.edu%7Cd
> 2ffd4065fbb410d5c0008dd0f05b081%7C0d4da0f84a314d76ace60a62331e1b84%7C0
> %7C0%7C638683239328245109%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRy
> dWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%
> 3D%7C0%7C%7C%7C&sdata=LTYa1YLUtR%2Bm26jjfvejSZq8WDfEsOlMKMdHxBsh9cg%3D
> &reserved=0 and provide commented, minimal, self-contained,
> reproducible code.
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide https://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list