[R] how to merge within range?
Kenn Konstabel
lebatsnok at gmail.com
Sun May 15 23:41:21 CEST 2011
I'd've first said it's "simply"
sapply(df1$time, function(x) if(any(foo <- (x>=df2$from &
x<=df2$to))>0) df2$value[which(foo)] else NA )
but the following are much nicer (except that instead of NA you'll
have 0 but that's easy to change if necessary):
colSums(sapply(df1$time, function(x) (x>=df2$from & x <=df2$to) * df2$value) )
rowSums(outer(df1$time, df2$from, ">=") * outer(df1$time, df2$to,
"<=") * df2$value)
On Sat, May 14, 2011 at 10:08 PM, René Mayer
<mayer at psychologie.tu-dresden.de> wrote:
> sqldf is impressive - compiled it now;
> the trick with findInterval is nice, too.
> thanks guys!!
>
>
>
>
> Zitat von "David Winsemius" <dwinsemius at comcast.net>:
>
>>
>> On May 14, 2011, at 2:27 PM, William Dunlap wrote:
>>
>>> You could use findInterval() along with a trick with c(rbind(...)):
>>>
>>>> i <- findInterval(x=df.1$time, vec=c(rbind(df.2$from, df.2$to)))
>>>> i
>>>
>>> [1] 1 1 1 2 3 3 3 5 5 6
>>
>> That's nice. I was working on a slightly different "trick"
>>
>> findInterval( df.1[,1],t(df.2[,1:2]))
>> [1] 1 1 1 2 3 3 3 5 5 6
>>
>> I was then trying to get the right indices with (.)'%%' 2 and (.) '%/%' 2
>>
>>
>>> The even-valued outputs would map to NA's, the odds
>>> to value[(i+1)/2], but you can use the c(rbind(...)) trick again:
>>>
>>>> c(rbind(df.2$value, NA))[i]
>>>
>>> [1] 1 1 1 NA 3 3 3 5 5 NA
>>
>> I'd like to understand that. Maybe, maybe... ah, got it. At first I didn't
>> realize those were the final answers since they looked like indices. My t(.)
>> trick doesn't generalize as well.
>>
>>
>> My earlier suggestion tht two merges woul do it was based on my erroneous
>> interpretation of the example, since I thought the task was to match on the
>> end points of the intervals.
>>
>>>
>>> Bill Dunlap
>>> Spotfire, TIBCO Software
>>> wdunlap tibco.com
>>>
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org
>>>> [mailto:r-help-bounces at r-project.org] On Behalf Of René Mayer
>>>> Sent: Saturday, May 14, 2011 11:06 AM
>>>> To: David Winsemius
>>>> Cc: r-help at r-project.org
>>>> Subject: Re: [R] how to merge within range?
>>>>
>>>> thanks David and Ian,
>>>> let me make a better example as the first one was flawed
>>>>
>>>> df.1=data.frame(round((1:10)*100+rnorm(10)), value=NA)
>>>> names(df.1) = c("time", "value")
>>>> df.1
>>>> time value
>>>> 1 101 NA
>>>> 2 199 NA
>>>> 3 301 NA
>>>> 4 401 NA
>>>> 5 501 NA
>>>> 6 601 NA
>>>> 7 700 NA
>>>> 8 800 NA
>>>> 9 900 NA
>>>> 10 1000 NA
>>>>
>>>> # from and to define ranges within time,
>>>> # note that from and to may not match the numbers given in time
>>>> df.2=data.frame(from=c(99,500,799),to=c(303,702,950), value=c(1,3,5))
>>>> df.2
>>>> from to value
>>>> 1 99 303 1
>>>> 2 500 702 3
>>>> 3 799 950 5
>>>>
>>>> what I want is:
>>>> time value
>>>> 1 101 1
>>>> 2 199 1
>>>> 3 301 1
>>>> 4 401 NA
>>>> 5 501 3
>>>> 6 601 3
>>>> 7 700 3
>>>> 8 800 5
>>>> 9 900 5
>>>> 10 1000 NA
>>>>
>>>> @David I don't know what you mean by 2 merges,
>>>> René
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Zitat von "David Winsemius" <dwinsemius at comcast.net>:
>>>>
>>>>>
>>>>> On May 14, 2011, at 9:16 AM, Ian Gow wrote:
>>>>>
>>>>>> If I assume that the third column in data.frame.2 is named
>>>>
>>>> "val" then in
>>>>>>
>>>>>> SQL terms it _seems_ you want
>>>>>>
>>>>>> SELECT a.time, b.val FROM data.frame.1 AS a LEFT JOIN
>>>>
>>>> data.frame.2 AS b ON
>>>>>>
>>>>>> a.time BETWEEN b.start AND b.end;
>>>>>>
>>>>>> Not sure how to do that elegantly using R subsetting/merge,
>>>>>
>>>>> Huh? It's just two merge()'s (... once you fix the error in
>>>>
>>>> the example.)
>>>>>
>>>>> --
>>>>> David
>>>>>
>>>>>> but you might
>>>>>> try a package that allows you to use SQL, such as sqldf.
>>>>>>
>>>>>>
>>>>>> On 5/14/11 8:03 AM, "David Winsemius"
>>>>
>>>> <dwinsemius at comcast.net> wrote:
>>>>>>
>>>>>>>
>>>>>>> On May 14, 2011, at 8:12 AM, René Mayer wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>> how can one merge
>>>>>>>
>>>>>>> And what happened when you typed:
>>>>>>>
>>>>>>> ?merge
>>>>>>>
>>>>>>>> two data frames when in the second data frame one column
>>>>
>>>> defines the
>>>>>>>>
>>>>>>>> start values
>>>>>>>> and another defines the end value of the to be merged range.
>>>>>>>> data.frame.1
>>>>>>>> time ...
>>>>>>>> 13
>>>>>>>> 24
>>>>>>>> 35
>>>>>>>> 46
>>>>>>>> 55
>>>>>>>> ...
>>>>>>>> data.frame.2
>>>>>>>> start end
>>>>>>>> 24 37 ?h? ?
>>>>>>>> ...
>>>>>>>>
>>>>>>>> should result in this
>>>>>>>> 13 NA
>>>>>>>> 24 ?h?
>>>>>>>> 35 ?h?
>>>>>>>> 46 NA
>>>>>>>> 55
>>>>>>>> ?
>>>>>>>
>>>>>>> And _why_ would that be?
>>>>>>>
>>>>>>>
>>>>>>>> thanks,
>>>>>>>> René
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-help at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide
>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained,
>>>>
>>>> reproducible code.
>>>>>>>
>>>>>>> David Winsemius, MD
>>>>>>> West Hartford, CT
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>>
>>>>>
>>>>> David Winsemius, MD
>>>>> West Hartford, CT
>>>>>
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list