[R] Fuzzy merge using timestamps
Sarah Goslee
sarah.goslee at gmail.com
Wed Nov 10 19:12:53 CET 2010
On Wed, Nov 10, 2010 at 12:57 PM, Ian Craig <ian.jhsph at gmail.com> wrote:
> Greetings Supreme Council of R Masters,
Nice. :)
> I have two sets of data, each with a set of timestamps. I would like to
> somehow merge the datasets based on the timestamps and an individual
> identifier. That is there are several individuals all with timestamps, with
> times that could overlap. By browsing through some of the older posts, I
> got the idea to create a third data frame of both sets of timestamps,
> individual identifiers, and a key to determine which dataset they have come
> from, then find the breaks to determine which of each dataset should be
> paired. the code I have written so far look something like this.
This would be easier to sort through if you included a toy example with
data so that we could try it. As it is, I have no idea what your data
actually look like.
> gpsdata$t_datetimegps<-as.POSIXct(gpsdata$t_datetimegps)
> urdata$t_datetimeur<-as.POSIXct(urdata$t_datetimeur)
>
> gpsdata$ID1 <- row.names(gpsdata)
> urdata$ID2 <- row.names(urdata)
>
> gpsdata$key1 <- rep(0, nrow(gpsdata))
> urdata$key2 <- rep(1, nrow(urdata))
>
> checkTimes <- data.frame(ID=c(gpsdata$ID1, urdata$ID2),
> ARC=c(gpsdata$gpsARC, urdata$urARC),
> times=c(gpsdata$t_datetimegps, urdata$t_datetimeur),
> key=c(gpsdata$key1, urdata$key2))
>
> checkTime <- checkTimes[order(checkTimes$ARC,checkTimes$times, decreasing =
> FALSE),]
>
> breaks <- which(diff(checkTime$key) == 1)
>
> match <- data.frame(ID1=checkTime$ID[breaks],
> gpsARC = checkTime$ARC[breaks],
> urARC = checkTime$ARC[breaks + 1],
> t_datetimegps=checkTime$times[breaks],
> t_datetimeur=checkTime$times[breaks + 1])
>
> #Then I merge the 'match' data frame with the gpsdata data frame and the
> product with the urdata data frame. The problem is that when I create the
> checkTime data frame and sort it, it sorts the urdata portion first then the
> gpsdata portion. So my key column looks like
> 1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, instead of
> 0,0,0,1,0,0,1,0,0,0,0,0,0,1, etc. even though I am not sorting on key.
> S.O.S!!!! Why is it doing this? Shouldn't it just order the timestamps of
> both data frames together?
So really this is a sorting problem, not a merging problem? Is the merging
part working correctly?
What exactly are you doing to merge? To sort?
Here again a worked functional example would be really useful. Without
knowing what you're doing, I can't offer suggestions.
Sarah
--
Sarah Goslee
http://www.functionaldiversity.org
More information about the R-help
mailing list