[R] Comparing dates in two large data frames
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Sat Apr 10 14:47:01 CEST 2021
Hello,
The following solution seems to work and is fast, like findInterval is.
It first determines where in df2$start is each value of df1$Time. Then
uses that index to see if those Times are not greater than the
corresponding df$end.
I checked against a small subset of df1 and the results were right.
result <- logical(nrow(df1))
inx <- findInterval(df1$Time, df2$start)
not_zero <- inx != 0
result[not_zero] <- df1$Time[not_zero] <= df2$end[ inx[not_zero] ]
Hope this helps,
Rui Barradas
Às 12:06 de 10/04/21, Kulupp escreveu:
> Dear all,
>
> I have two data frames (df1 and df2) and for each timepoint in df1 I
> want to know: is it whithin any of the timespans in df2? The result
> (e.g. "no" or "yes" or 0 and 1) should be shown in a new column of df1
>
> Here is the code to create the two data frames (the size of the two data
> frames is approx. the same as in my original data frames):
>
> # create data frame df1
> ti1 <- seq.POSIXt(from=as.POSIXct("2020/01/01", tz="UTC"),
> to=as.POSIXct("2020/06/01", tz="UTC"), by="10 min")
> df1 <- data.frame(Time=ti1)
>
> # create data frame df2 with random timespans, i.e. start and end dates
> start <- sort(sample(seq(as.POSIXct("2020/01/01", tz="UTC"),
> as.POSIXct("2020/06/01", tz="UTC"), by="1 mins"), 5000))
> end <- start + 120
> df2 <- data.frame(start=start, end=end)
>
> Everything I tried (ifelse combined with sapply or for loops) has been
> very very very slow. Thus, I am looking for a reasonably fast solution.
>
> Thanks a lot for any hint in advance !
>
> Cheers,
>
> Thomas
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list