[R] Conditionally remove rows with logic
MacQueen, Don
macqueen1 at llnl.gov
Mon Aug 8 17:52:07 CEST 2016
Assuming that within each ID the data is sorted by increasing TIME, and
that LABEL==1 occours only once within each ID. Then I would try something
like this.
Suppose that your data is in a data frame named "df".
df.keep <- logical()
for (id in unique(df$ID)) {
df.tmp <- subset(df, df$ID==id)
tmp.keep <- rep(TRUE, nrow(df.tmp))
tmp.keep[df.tmp$TIME > df.tmp$TIME[df.tmp$LABEL==1]] <- FALSE
df.keep <- c(df.keep, tmp.keep)
}
newdf <- df[df.keep , ]
I have not tested this.
I'm sure it could be made more efficient, and probably with a bit of
cleverness one could avoid creating temporary subsets of the input. But I
tend to find such subsets handy for testing and debugging.
Unless your input data is huge, it should be fast enough that you won't
notice the inefficiencies.
-Don
--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
On 8/7/16, 3:21 PM, "R-help on behalf of Jennifer Sheng"
<r-help-bounces at r-project.org on behalf of jennifer.sheng2002 at gmail.com>
wrote:
>Dear all,
>
>I need to remove any rows AFTER the label becomes 1. For example, for ID
>1, the two rows with TIME of 15 & 18 should be removed; for ID 2, any rows
>after time 6, i.e., rows of time 9-18, should be removed. Any
>suggestions? Thank you very much!
>
>The current dataset looks like the following:
>ID TIME LABEL
>1 0 0
>1 3 0
>1 6 0
>1 9 0
>1 12 1
>1 15 0
>1 18 0
>2 0 0
>2 3 0
>2 6 1
>2 9 0
>2 12 0
>2 15 0
>2 18 0
>
>Thanks a lot!
>Jennifer
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list