[R] arithmetic problem
Gabor Grothendieck
ggrothendieck at gmail.com
Sat May 30 18:11:10 CEST 2009
Here are are assuming
1. for each row that if that row's value is within 200 - 300 of the
prior or next value with the same ind then that row should be extracted.
2. the input is sorted by values within ind
If that's not the intention then modify the code accordingly.
First we read in the data into data frame DF.
Then we define between(x, min, max) which is a function that returns a
vector whose
ith component is TRUE if x[i] is between min and max.
Then use ave() to get a selection vector. In this case ave returns a vector of
zeros and ones and we convert that to the logical vector sel which
defines the selection.
# read the data
Lines <- "values ind
1 2655 7A5
2 3028 7A5
3 689 ABBA-1
4 1336 ABBA-1
5 1560 ABBA-1
6 2820 ABLIM1
7 3339 ABLIM1
8 171 ACSM5
9 195 ACSM5
10 43 ADAMDEC1
11 129 ADAMDEC1
12 1105 AFF1
13 3202 AFF1
14 852 AFF3
15 2461 AFF3
16 45 AKT1
17 397 AKT1
18 1430 AQP2
19 2402 AQP2
20 2551 ARHGAP19"
DF <- read.table(textConnection(Lines), header = TRUE)
between <- function(x, min, max) x > min & max > x
sel <- ave(DF$values, DF$ind, FUN = function(v)
between(c(FALSE, diff(v)), 200, 300) | between(c(diff(v), FALSE), 200, 300)
) > 0
DF[sel, ]
On Sat, May 30, 2009 at 10:13 AM, Iain Gallagher
<iaingallagher at btopenworld.com> wrote:
>
> Hello list
>
> I have a problem with a dataset (see toy example below) where I am trying to find the difference between two (or more numbers) and discard those observations which fall outside a set interval.
>
> An example and further explanation:
>
> values ind
> 1 2655 7A5
> 2 3028 7A5
> 3 689 ABBA-1
> 4 1336 ABBA-1
> 5 1560 ABBA-1
> 6 2820 ABLIM1
> 7 3339 ABLIM1
> 8 171 ACSM5
> 9 195 ACSM5
> 10 43 ADAMDEC1
> 11 129 ADAMDEC1
> 12 1105 AFF1
> 13 3202 AFF1
> 14 852 AFF3
> 15 2461 AFF3
> 16 45 AKT1
> 17 397 AKT1
> 18 1430 AQP2
> 19 2402 AQP2
> 20 2551 ARHGAP19
>
> Each number in the values column above is associated with a label (in the ind column). For some inds there will be only 2 values but as can be seen from the data other inds have many values.
>
> Here's what I want to do using the ABBA-1 data from above as an example:
>
> calculate the differences between each value:
>
> 1560-1336 = 224
> 1336-689 = 647
>
> then use these values to create an index that will allow me to pull out values between set limits. If I set the limits to between 200 and 300 then the index will reference rows 4 & 5 in the above data set.
>
> I hope this is reasonably clear and I appreciate any suggestions.
>
> Thanks
>
> Iain
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list