[R] Need a faster function to replace missing data
Tim Clark
mudiver1200 at yahoo.com
Sat May 23 01:39:32 CEST 2009
Jim,
Thanks! I like the way you use indexing instead of the loops. However, the find.Interval function does not give the right result. I have been playing with it and it seems to give the closest number that is less than the one of interest. In this case, the correct replacement should have been 40, not 30, since 12:15 from mygarmin is closer to 12:14 in myvscan than 12:10. Is there a way to get the function to find the closest in value instead of the next smaller value? I was trying to use which.min to get the closet date but can't seem to get it to work right either.
Aloha,
Tim
Tim Clark
Department of Zoology
University of Hawaii
--- On Fri, 5/22/09, jim holtman <jholtman at gmail.com> wrote:
> From: jim holtman <jholtman at gmail.com>
> Subject: Re: [R] Need a faster function to replace missing data
> To: "Tim Clark" <mudiver1200 at yahoo.com>
> Cc: r-help at r-project.org
> Date: Friday, May 22, 2009, 7:24 AM
> I think this does what you
> want. It uses 'findInterval' to determine where a
> possible match is:
>
> >
> myvscan<-data.frame(c(1,NA,1.5),as.POSIXct(c("12:00:00","12:14:00","12:20:00"),
> format="%H:%M:%S"))
> > # convert to numeric
> >
> names(myvscan)<-c("Latitude","DateTime")
>
> > myvscan$tn <- as.numeric(myvscan$DateTime) #
> numeric for findInterval
> >
> mygarmin<-data.frame(c(20,30,40),as.POSIXct(c("12:00:00","12:10:00","12:15:00"),
> format="%H:%M:%S"))
>
> >
> names(mygarmin)<-c("Latitude","DateTime")
> > mygarmin$tn <- as.numeric(mygarmin$DateTime)
> >
> > # use 'findInterval'
> > na.indx <- which(is.na(myvscan$Latitude)) # find
> NAs
>
> > # replace with garmin latitude
> > myvscan$Latitude[na.indx] <-
> mygarmin$Latitude[findInterval(myvscan$tn[na.indx],
> mygarmin$tn)]
> >
> >
> > myvscan
> Latitude DateTime
> tn
>
> 1 1.0 2009-05-22 12:00:00 1243008000
> 2 30.0 2009-05-22 12:14:00 1243008840
> 3 1.5 2009-05-22 12:20:00 1243009200
> >
>
>
>
> On Fri, May 22, 2009 at 12:45 AM,
> Tim Clark <mudiver1200 at yahoo.com>
> wrote:
>
>
> Dear List,
>
> I need some help in coming up with a function that will
> take two data sets, determine if a value is missing in one,
> find a value in the second that was taken at about the same
> time, and substitute the second value in for where the first
> should have been. My problem is from a fish tracking
> study. We put acoustic tags in fish and track them for
> several days. Location data is supposed to be
> automatically recorded every time we detect a
> "ping" from the fish. Unfortunately the GPS had
> some problems and sometimes the fishes depth was recorded
> but not its location. I fortunately had a back-up GPS that
> was taking location data every five minutes. I would like
> to merge the two files, replacing the missing value in the
> vscan (automatic) file with the location from the garmin
> file. Since we were getting vscan records every 1-2
> seconds and garmin records every 5 minutes, I need to find
> the right place in the vscan file to place the garmin record
> - i.e. the
>
> closest in time, but not greater than 5 minutes. I have
> written a function that does this. However, it works with my
> test data but locks up my computer with my real data. I
> have several million vscan records and several thousand
> garmin records. Is there a better way to do this?
>
>
>
> My function and test data:
>
> myvscan<-data.frame(c(1,NA,1.5),times(c("12:00:00","12:14:00","12:20:00")))
> names(myvscan)<-c("Latitude","DateTime")
>
> mygarmin<-data.frame(c(20,30,40),times(("12:00:00","12:10:00","12:15:00")))
> names(mygarmin)<-c("Latitude","DateTime")
>
> minute.diff<-1/24/12 #Time diff is in days, so this
> is 5 minutes
>
> for (k in 1:nrow(myvscan))
> {
> if (is.na(myvscan$Latitude[k]))
> {
> if ((min(abs(mygarmin$DateTime-myvscan$DateTime[k]))) <
> minute.diff )
> {
> index.min.date<-which.min(abs(mygarmin$DateTime-myvscan$DateTime[k]))
>
> myvscan$Latitude[k]<-mygarmin$Latitude[index.min.date]
> }}}
>
> I appreciate your help and advice.
>
> Aloha,
>
> Tim
>
>
>
>
> Tim Clark
> Department of Zoology
> University of Hawaii
>
> ______________________________________________
>
> R-help at r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>
> and provide commented, minimal, self-contained,
> reproducible code.
>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
>
More information about the R-help
mailing list