[R] [External] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments

Richard M. Heiberger rmh @end|ng |rom temp|e@edu
Wed Nov 27 18:22:59 CET 2024


I would use base R.


newdata <- cbind(olddata, FirstDay=olddata$date)
newdata$FirstDay <- with(newdata, {
  for (thisID in unique(ID))
    FirstDay[ID==thisID] <- FirstDay[ID==thisID][1]
  FirstDay}
  )
newdata

note that both my solution and Olivier have newdata$FirstDay[17:18] == 10
which is what I thinkk you intended.

Rich

> On Nov 27, 2024, at 11:30, Sorkin, John <jsorkin using som.umaryland.edu> wrote:
>
> I am an old, long time SAS programmer. I need to produce R code that processes a dataframe in a manner that is equivalent to that produced by using a by statement in SAS and an if first.day statement and a retain statement:
>
> I want to take data (olddata) that looks like this
> ID Day
> 1 1
> 1 1
> 1 2
> 1 2
> 1 3
> 1 3
> 1 4
> 1 4
> 1 5
> 1 5
> 2 5
> 2 5
> 2 5
> 2 6
> 2 6
> 2 6
> 3 10
> 3 10
>
> and make it look like this:
> (withing each ID I am copying the first value of Day into a new variable, FirstDay, and propagating the FirstDay value through all rows that have the same ID:
>
> ID Day FirstDay
> 1 1 1
> 1 1 1
> 1 2 1
> 1 2 1
> 1 3 1
> 1 3 1
> 1 4 1
> 1 4 1
> 1 5 1
> 1 5 1
> 2 5 5
> 2 5 5
> 2 5 5
> 2 6 5
> 2 6 5
> 2 6 5
> 3 10 3
> 3 10 3
>
> SAS code that can do this is:
>
> proc sort data=olddata;
>  by ID Day;
> run;
>
> data newdata;
>  retain FirstDay;
>  set olddata;
>  by ID;
>  if first.ID then FirstDay=Day;
> run;
>
> I have NO idea how to do this is R (so I can't post test-code), but below I have R code that creates olddata:
>
> ID <- c(rep(1,10),rep(2,6),rep(3,2))
> date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2),
>          rep(5,3),rep(6,3),rep(10,2))
> date
> olddata <- data.frame(ID=ID,date=date)
> olddata
>
> Any suggestions on how to do this would be appreciated. . . I have worked on this for more than 12-hours, despite multiple we searches I have gotten nowhere. . .
>
> Thanks
> John
>
>
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine, University of Maryland School of Medicine;
> Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center;
> PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center;
> Senior Statistician University of Maryland Center for Vascular Research;
>
> Division of Gerontology and Paliative Care,
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> Cell phone 443-418-5382
>
>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.r-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list