[R] complicated time series filtering issue
Eric Berger
er|cjberger @end|ng |rom gm@||@com
Tue Apr 5 14:36:37 CEST 2022
For a different approach, use the Date column rather than the differences
column.
I assume the data has been put into the bc.df data frame (as Jim does,
above)
f <- function(v,m=10L) {
w <- 1L
while( (i <- tail(w,1)) < length(v))
w <- c(w, match(TRUE,v[i:(i+m+1)] > v[i]+m )+(i-1))
w
}
f(as.integer(as.Date(strptime(bc.df$Date,"%d-%b-%y"))))
On Tue, Apr 5, 2022 at 1:16 AM Jim Lemon <drjimlemon using gmail.com> wrote:
> Hi Brian,
> Perhaps this:
>
> bc.df<-read.table(text="Date INDIVIDUAL DATENUMBER LENGTH length.prev
> interval
> 12-May-04 57084544 133 682.4 NA NA
> 28-Sep-04 57084544 272 724.8 682.4 139
> 30-Sep-04 57084544 274 740.8 724.8 2
> 7-Oct-04 57084544 281 745.4 740.8 7
> 22-Nov-04 57084544 327 780.2 745.4 46
> 27-Jan-05 57084544 393 817.2 780.2 66
> 8-Mar-05 57084544 433 834.1 817.2 40
> 2-Jul-05 57084544 549 876.3 834.1 116
> 6-Jul-05 57084544 553 871.5 876.3 4
> 4-Aug-05 57084544 582 887.5 871.5 29
> 28-Dec-05 57084544 728 921.8 887.5 146
> 31-Jan-06 57084544 762 936.8 921.8 34
> 27-Feb-06 57084544 789 962.4 936.8 27
> 21-Nov-06 57084544 1056 972.3 962.4 267
> 30-Mar-07 57084544 1185 1007.2 972.3 129
> 23-Apr-07 57084544 1209 1009.1 1007.2 24
> 22-May-07 57084544 1238 991.6 1009.1 29
> 23-May-07 57084544 1239 1015.9 991.6 1
> 16-Jul-07 57084544 1293 1006.5 1015.9 54
> 9-Aug-07 57084544 1317 1013.0 1006.5 24
> 27-Aug-07 57084544 1335 1013.0 1013.0 18
> 29-Jul-08 57084544 1672 1021.5 1013.0 337
> 30-Jul-08 57084544 1673 984.3 1021.5 1
> 31-Jul-08 57084544 1674 1008.5 984.3 1
> 10-Aug-08 57084544 1684 1002.8 1008.5 10
> 22-Oct-08 57084544 1757 977.6 1002.8 73
> 2-Dec-08 57084544 1798 1000.6 977.6 41",
> stringsAsFactors=FALSE,header=TRUE)
> min_interval<-function(x,minint=10) {
> indx<-1
> cumint<-0
> for(i in 2:length(x)) {
> cumint<-cumint+x[i]
> if(cumint > minint) {
> indx<-c(indx,i)
> cumint<-0
> }
> }
> return(indx)
> }
> min_interval(bc.df$interval)
>
> Jim
>
> On Tue, Apr 5, 2022 at 7:31 AM Ebert,Timothy Aaron <tebert using ufl.edu> wrote:
> >
> > I think the idea is more
> > for (i in 2:nrow(x)){
> > ifelse(x[i]-x[i-1] >10) {keep x[i], delete x[i]]
> > }
> >
> > I am not quite clear on the correct code for "keep" or "delete."
> >
> > One could try
> > for (i in 2:nrow(x)){
> > x$new[i] <- x[i]-x[i-1]
> > }
> > x <- x %>% filter(new>=10)
> >
> > This only works if consecutive sample dates are 10 or more days apart.
> You could add an else if that would accumulate days, and if successful
> reset the clock.
> >
> > Tim
> > -----Original Message-----
> > From: R-help <r-help-bounces using r-project.org> On Behalf Of Bert Gunter
> > Sent: Monday, April 4, 2022 5:04 PM
> > To: Cade, Brian S <cadeb using usgs.gov>
> > Cc: r-help using r-project.org
> > Subject: Re: [R] complicated time series filtering issue
> >
> > [External Email]
> >
> > Like this?
> >
> > winnow <- function(x, int=5){
> > keep <- x[1]
> > remaining <- x[-1]
> > while (length(remaining))
> > {
> > nxt <- tail(keep,1) + int
> > if(length(remaining) ==1 ||
> > all(remaining < nxt))break
> > remaining <- remaining[remaining >tail(keep,1) + int]
> > keep <- c(keep,remaining[1])
> > }
> > keep
> > }
> >
> > > x
> > [1] 1 2 5 7 8 9 15 16 17 19 20 21 28 35 37 41 43 45 46 50
> > > winnow(x,7)
> > [1] 1 9 17 28 37 45
> > > winnow(x,5)
> > [1] 1 7 15 21 28 35 41 50
> >
> > Cheers,
> > Bert
> >
> > "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> > On Mon, Apr 4, 2022 at 12:56 PM Cade, Brian S via R-help <
> r-help using r-project.org> wrote:
> > >
> > > Hello: I have an issue with filtering in a time series of animal
> > > growth data that seems conceptually simple but I have not come up with
> > > effective code to implement this. I have temporal sequences of
> > > lengths by individuals and I want to retain only those data that are
> > > >10 days apart sequentially within an individuals records. I can
> > > readily compute intervals between successive dates by individual using
> > > data.table() and its by = INDIVIDUAL functionality. See example data
> > > for one individual below. But what currently eludes me in processing
> > > this is how to recognize for example that deleting the 2nd and 3rd
> > > rows is required because the totality of their time interval is 9
> > > days, deleting 8th record with 4 days is required, deleting 17th
> > > record with 1 day is required, deleting 22nd and 23rd records is
> > > required because their sum is 2 days, but we do not delete 24th record
> > > of 10 days because the sum of previous 2 records deleted and this one
> > > is now 12 days. Each individual can have ve
> > ry
> > > different patterns of these sort of sequences. These sequences are
> easy to look at and determine what needs to be done but writing effective
> code to accomplish this filtering seems to require some functionality that
> I am currently missing.
> > >
> > > Any suggestions would be greatly appreciated.
> > >
> > > Date INDIVIDUAL DATENUMBER LENGTH length.prev interval
> > > 228 12-May-04 57084544 133 682.4 NA NA
> > > 229 28-Sep-04 57084544 272 724.8 682.4 139
> > > 230 30-Sep-04 57084544 274 740.8 724.8 2
> > > 231 7-Oct-04 57084544 281 745.4 740.8 7
> > > 232 22-Nov-04 57084544 327 780.2 745.4 46
> > > 233 27-Jan-05 57084544 393 817.2 780.2 66
> > > 234 8-Mar-05 57084544 433 834.1 817.2 40
> > > 235 2-Jul-05 57084544 549 876.3 834.1 116
> > > 236 6-Jul-05 57084544 553 871.5 876.3 4
> > > 237 4-Aug-05 57084544 582 887.5 871.5 29
> > > 238 28-Dec-05 57084544 728 921.8 887.5 146
> > > 239 31-Jan-06 57084544 762 936.8 921.8 34
> > > 240 27-Feb-06 57084544 789 962.4 936.8 27
> > > 241 21-Nov-06 57084544 1056 972.3 962.4 267
> > > 242 30-Mar-07 57084544 1185 1007.2 972.3 129
> > > 243 23-Apr-07 57084544 1209 1009.1 1007.2 24
> > > 244 22-May-07 57084544 1238 991.6 1009.1 29
> > > 245 23-May-07 57084544 1239 1015.9 991.6 1
> > > 246 16-Jul-07 57084544 1293 1006.5 1015.9 54
> > > 247 9-Aug-07 57084544 1317 1013.0 1006.5 24
> > > 248 27-Aug-07 57084544 1335 1013.0 1013.0 18
> > > 249 29-Jul-08 57084544 1672 1021.5 1013.0 337
> > > 250 30-Jul-08 57084544 1673 984.3 1021.5 1
> > > 251 31-Jul-08 57084544 1674 1008.5 984.3 1
> > > 252 10-Aug-08 57084544 1684 1002.8 1008.5 10
> > > 253 22-Oct-08 57084544 1757 977.6 1002.8 73
> > > 254 2-Dec-08 57084544 1798 1000.6 977.6 41
> > >
> > >
> > >
> > > Brian
> > >
> > >
> > >
> > > Brian S. Cade, PhD
> > >
> > > U. S. Geological Survey
> > > Fort Collins Science Center
> > > 2150 Centre Ave., Bldg. C
> > > Fort Collins, CO 80526-8818
> > >
> > > email: cadeb using usgs.gov<mailto:brian_cade using usgs.gov>
> > > tel: 970 226-9326
> > >
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
> > > man_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAs
> > > Rzsn7AkP-g&m=ZfVdnGSALzyajo_d1U09NJs3RCXcx5NwQ2PZ9A9zwEnVYnexn4toTyxgu
> > > -vCEJab&s=PG1chCZY6eQzSdtSlvChVVVt0HXVDG1bgBkJMQ8wk1A&e=
> > > PLEASE do read the posting guide
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or
> > > g_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA
> > > sRzsn7AkP-g&m=ZfVdnGSALzyajo_d1U09NJs3RCXcx5NwQ2PZ9A9zwEnVYnexn4toTyxg
> > > u-vCEJab&s=D_bzOVjWanUgYD_zJq-IS8EObMKBmC5Q5D-a_IHxMAA&e=
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=ZfVdnGSALzyajo_d1U09NJs3RCXcx5NwQ2PZ9A9zwEnVYnexn4toTyxgu-vCEJab&s=PG1chCZY6eQzSdtSlvChVVVt0HXVDG1bgBkJMQ8wk1A&e=
> > PLEASE do read the posting guide
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=ZfVdnGSALzyajo_d1U09NJs3RCXcx5NwQ2PZ9A9zwEnVYnexn4toTyxgu-vCEJab&s=D_bzOVjWanUgYD_zJq-IS8EObMKBmC5Q5D-a_IHxMAA&e=
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list