[R] how to subset based on other row values and multiplicity
John McKown
john.archie.mckown at gmail.com
Wed Jul 16 16:09:02 CEST 2014
On Wed, Jul 16, 2014 at 8:51 AM, jim holtman <jholtman at gmail.com> wrote:
> I can reproduce what you requested, but there was the question about
> what happens with the multiple 'c-y' values.
>
> ====================
>
>> require(data.table)
>> x <- read.table(text = 'id date value
> + a 2000-01-01 x
> + a 2000-03-01 x
> + b 2000-11-11 w
> + c 2000-11-11 y
> + c 2000-10-01 y
> + c 2000-09-10 y
> + c 2000-12-12 z
> + c 2000-10-11 z
> + d 2000-11-11 w
> + d 2000-11-10 w', as.is = TRUE, header = TRUE)
>> setDT(x)
>> x[, date := as.Date(date)]
>> setkey(x, id, value, date)
>>
>> y <- x[
> + , {
> + if (.N == 1) val <- NULL # only one -- delete
> + else {
> + dif <- difftime(tail(date, -1), head(date, -1), units = 'days')
> + # return first value if any > 31
> + if (any(dif >= 31)) val <- list(date = date[1L])
> + else val <- NULL
> + }
> + val
> + }
> + , keyby = 'id,value'
> + ]
>> y
> id value date
> 1: a x 2000-01-01
> 2: c y 2000-09-10
> 3: c z 2000-10-11
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
Wow, I picked up a couple of _nice_ techniques from that one post!
Looks like "data.table" will let me do SQL like things in R. I have a
warped brain. I think in "result sets" and "matrix operations"
Many thanks.
--
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan
Maranatha! <><
John McKown
More information about the R-help
mailing list