[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments

David Winsemius dw|n@em|u@ @end|ng |rom comc@@t@net
Wed Nov 27 19:08:12 CET 2024


On 11/27/24 09:44, David Winsemius via R-help wrote:
> On 11/27/24 08:30, Sorkin, John wrote:
>> I am an old, long time SAS programmer. I need to produce R code that processes a dataframe in a manner that is equivalent to that produced by using a by statement in SAS and an if first.day statement and a retain statement:
>>
>> I want to take data (olddata) that looks like this
>> ID	Day
>> 1	1
>> 1	1
>> 1	2
>> 1	2
>> 1	3
>> 1	3
>> 1	4
>> 1	4
>> 1	5
>> 1	5
>> 2	5
>> 2	5
>> 2	5
>> 2	6
>> 2	6
>> 2	6
>> 3	10
>> 3	10
>>
>> and make it look like this:
>> (withing each ID I am copying the first value of Day into a new variable, FirstDay, and propagating the FirstDay value through all rows that have the same ID:
>>
>> ID	Day	FirstDay
>> 1	1	1
>> 1	1	1
>> 1	2	1
>> 1	2	1
>> 1	3	1
>> 1	3	1
>> 1	4	1
>> 1	4	1
>> 1	5	1
>> 1	5	1
>> 2	5	5
>> 2	5	5
>> 2	5	5
>> 2	6	5
>> 2	6	5
>> 2	6	5
>> 3	10	3
>> 3	10	3
>>
>> SAS code that can do this is:
>>
>> proc sort data=olddata;
>>     by ID Day;
>> run;
>>
>> data newdata;
>>     retain FirstDay;
>>     set olddata;
>>     by ID;
>>     if first.ID then FirstDay=Day;
>> run;
>>
>> I have NO idea how to do this is R (so I can't post test-code), but below I have R code that creates olddata:
>>
>> ID <- c(rep(1,10),rep(2,6),rep(3,2))
>> date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2),
>>             rep(5,3),rep(6,3),rep(10,2))
>> date
>> olddata <- data.frame(ID=ID,date=date)
>> olddata
>>
>> Any suggestions on how to do this would be appreciated. . . I have worked on this for more than 12-hours, despite multiple we searches I have gotten nowhere. . .
>
> My earlier approach incorrectly picked the firs of the ID column rather
> than the first of the `date` column to be repeated withing the indexed
> group so here's the correct code:

That's embarrassing. Sorry for the HTML. I thought that Thunderbird was 
smart enough to reply in kind.. This should be formatted correctly

> olddata$FirstDay <- unlist( by(olddata, olddata["ID"], FUN= function(x) 
{ rep( x$date[1], times=nrow(x) )}) ) > olddata ID date FirstDay 1 1 1 1 
2 1 1 1 3 1 2 1 4 1 2 1 5 1 3 1 6 1 3 1 7 1 4 1 8 1 4 1 9 1 5 1 10 1 5 1 
11 2 5 5 12 2 5 5 13 2 5 5 14 2 6 5 15 2 6 5 16 2 6 5 17 3 10 10 18 3 10 10


>
>> olddata$FirstDay <- unlist( by(olddata, olddata["ID"], FUN= function(x)
> { rep( x$date[1], times=nrow(x) )}) ) > olddata ID date FirstDay 1 1 1 1
> 2 1 1 1 3 1 2 1 4 1 2 1 5 1 3 1 6 1 3 1 7 1 4 1 8 1 4 1 9 1 5 1 10 1 5 1
> 11 2 5 5 12 2 5 5 13 2 5 5 14 2 6 5 15 2 6 5 16 2 6 5 17 3 10 10 18 3 10 10
>
>> Thanks
>> John
>>
>>
>>
>>
>> John David Sorkin M.D., Ph.D.
>> Professor of Medicine, University of Maryland School of Medicine;
>> Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center;
>> PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center;
>> Senior Statistician University of Maryland Center for Vascular Research;
>>
>> Division of Gerontology and Paliative Care,
>> 10 North Greene Street
>> GRECC (BT/18/GR)
>> Baltimore, MD 21201-1524
>> Cell phone 443-418-5382
>>
>>
>>
>> ______________________________________________
>> R-help using r-project.org  mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guidehttps://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list