[R] Filtering data
Mike Lonergan
mel at mcs.st-and.ac.uk
Wed Nov 7 18:35:57 CET 2001
How about:
grouped<-cumsum(ifelse(c(Data$Day[1]-2,Data$Day[-length(Data$Day)])!=(Data$D
ay-1) ,1,0))
peaks<-unlist(tapply(-Data$Flow,grouped,order,simplify=FALSE))
Data[peaks==1,]
(if I haven't messed up this finds all the rows that don't follow one from
the day before & uses them to number each group of consecutive days, then
sorts the Flows in each group, giving a vector with the groups' maxima
having value '1'. But I probably have.)
I don't know how much time it'd save.
Cheers,
Mike.
----------------------------------------------------------------
Mike Lonergan
Research Unit for Wildlife Population Assessment
Mathematical Institute
University of St Andrews
North Haugh Tel: +44 (0) 1334 463760
St Andrews Fax: +44 (0) 1334 463748
Fife KY16 9SS Email: mel at mcs.st-and.ac.uk
Scotland http://www-ruwpa.mcs.st-and.ac.uk
----------------------------------------------------------------
-----Original Message-----
From: owner-r-help at stat.math.ethz.ch
[mailto:owner-r-help at stat.math.ethz.ch]On Behalf Of John Fox
Sent: 07 November 2001 12:47
To: Matt Pocernich
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Filtering data
At 08:39 PM 11/6/2001 -0700, Matt Pocernich wrote:
>I am having difficulty filtering data. I am working with flow data
>collected at a stream gage. For each record, I have a date and flow
>value. I have filtered this data to only include days when flow values
>exceed a given threshold.
>
>Here is my problem. Within this subset of data, I often have several
>consecutive days above the threshold. From this group of days, I wish to
>select the record (both date and flow) containing the maximum flow. If an
>exceedance is isolated ( the preceeding and succeeding day is below the
>threshold) I also wish to select that record.
>
>For example from the data set
>
>Day Flow
>
>1 10
>4 13
>5 20
>6 15
>9 13
>
>I would like the 1st, 3rd and 5th record filered.
>
>Any ideas on how I would write such and algorithm would be appreciated.
Dear Matt,
Here's a function that does what you want with loops. Perhaps someone else
will produce a more elegant solution:
> select.rows <- function(data) {
+ indices <- data[,1]
+ values <- data[,2]
+ n <- length(indices)
+ if (n == 0) stop('no data')
+ if (n == 1) return(data)
+ selection <- rep(0, n) # so as not to grow the selection vector
+ current <- 1
+ number <- 1
+ for (i in 2:n){
+ if (indices[i] == 1 + indices[i - 1]){
+ if (values[i] > values[current]) current <- i
+ }
+ else {
+ selection[number] <- current
+ number <- number + 1
+ current <- i
+ }
+ }
+ selection[number] <- current
+ data[selection,]
+ }
>
> data <- matrix(c(1,4,5,6,9, 10,13,20,15,13), 5, 2)
> colnames(data) <- c('Day', 'Flow')
> select.rows(data)
Day Flow
[1,] 1 10
[2,] 5 20
[3,] 9 13
I hope that this isn't too inefficient.
John
-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox
-----------------------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list