[R] how to subset based on other row values and multiplicity
Williams Scott
Scott.Williams at petermac.org
Wed Jul 16 15:07:44 CEST 2014
Hi R experts,
I have a dataset as sampled below. Values are only regarded as Œconfirmed¹
in an individual (Œid¹) if they occur
more than once at least 30 days apart.
id date value
a 2000-01-01 x
a 2000-03-01 x
b 2000-11-11 w
c 2000-11-11 y
c 2000-10-01 y
c 2000-09-10 y
c 2000-12-12 z
c 2000-10-11 z
d 2000-11-11 w
d 2000-11-10 w
I wish to subset the data to retain rows where the value for the
individual is confirmed more than 30 days apart. So, after deleting all
rows with just one occurrence of id and value, the rest would be the
earliest occurrence of each value in each case id, provided 31 or more
days exist between the dates. If >1 value is present per id, each value
level needs to be assessed independently. This example would then reduce
to:
id date value
a 2000-01-01 x
c 2000-09-10 y
c 2000-10-11 z
I can do this via some crude loops and subsetting, but I am looking for as
much efficiency as possible
as the dataset has around 50 million rows to assess. Any suggestions
welcomed.
Thanks in advance
Scott Williams MD
Melbourne, Australia
This email (including any attachments or links) may contain
confidential and/or legally privileged information and is
intended only to be read or used by the addressee. If you
are not the intended addressee, any use, distribution,
disclosure or copying of this email is strictly
prohibited.
Confidentiality and legal privilege attached to this email
(including any attachments) are not waived or lost by
reason of its mistaken delivery to you.
If you have received this email in error, please delete it
and notify us immediately by telephone or email. Peter
MacCallum Cancer Centre provides no guarantee that this
transmission is free of virus or that it has not been
intercepted or altered and will not be liable for any delay
in its receipt.
More information about the R-help
mailing list