[R] Creating new variable with maximum visit date by group_id
David Winsemius
dwinsemius at comcast.net
Thu Aug 25 00:25:08 CEST 2011
On Aug 24, 2011, at 5:15 PM, Kathleen Rollet wrote:
> Dear R users,
>
> I am encoutering the following problem: I have a dataset with a
> 'unique_id' and different 'visit_date' (formatted as.Date, "%d/%m/
> %Y") per unique_id. I would like to create a new variable with the
> most recent date of visit per unique_id as shown below.
That should not result in what is below unless you have changes
something in options() forcing a different data output format. (Is
that even possible?)
>
> unique_id visit_date last_visit_date
> 1 01/06/2010 01/06/2011
> 1 01/01/2011 01/06/2011
> 1 01/06/2011 01/06/2011
> 2 01/01/2009 01/07/2011
> 2 01/06/2009 01/07/2011
> 2 01/06/2010 01/07/2011
> 2 01/01/2011 01/07/2011
> 2 01/07/2011 01/07/2011
> 3 01/01/2008 01/01/2008
> 4 01/01/2009 01/01/2010
> 4 01/01/2010 01/01/2010
>
Read it in as dfrm named "dat" with:
colClasses=c("numeric", "character", "character")
Then:
dat$visit_date <-as.Date(dat$visit_date, format="%d/%m/%Y",
origin="1970-01-01")
dat$last_visit_date <-as.Date(dat$last_visit_date, format="%d/%m/%Y",
origin="1970-01-01")
> I know the coding to easily do this in Stata, SAS, and Excel but I
> cannot find how to do it in R. I try multiple function such as
> tapply( ), ave( ), ddply ( ), and transform ( ) after looking into
> previous postings. The codes are running but only NA values are
> generated or I get error messages that the replacement has less row
> than the data has (there are about 1000 unique_id and over 4000 rows
> in my dataset presently).
The 'ave' function should be able to do it. It returns a vector as
long as the dataframe has rows.
You are asked to post your failures as well as reproducible code which
is best produced with dput(). (This apples doubly so when you choose
non-standard formats for Date objects.)
Please read:
?dput
?ave
Worked example:
dat$most_recent<- format(ave(dat$visit_date, dat$unique_id, FUN=max),
format="%d/%m/%Y")
dat
NOTE: that last column is not an R date but rather a character vector.
> I would greatly appreciate if someone could help me.
>
> Thank you!
>
> Kathleen R.
> Epidemiologist
> Montreal, QC, Canada
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list