[R] Error in which()
David Winsemius
dwinsemius at comcast.net
Thu Jul 8 22:10:05 CEST 2010
On Jul 8, 2010, at 3:23 PM, Muhammad Rahiz wrote:
> Hi all,
>
> I'm trying to filter data into respective numbers. For example, if
> the data ranges from 0 to <0.1, group the data. And so on for the
> rest of the data.
> There are inconsistencies in the output. For example, b1[[3]] lumps
> all the 0.2s and 0.3s together while 0.6s are not in the output.
Any time you are working with floating point numbers you should be
using all.equal rather than ==. You could easily be getting bitten by
a test for >= that declares this to be FALSE when you expected it to
be TRUE
>
> Running the function - table(f1) - shows that each of the components/
> numbers has x number of elements in them. But this is not showing in
> the results of the script.
>
> Can anyone assist?
>
>
> Thanks,
>
> Muhammad
>
>
>
>
> f1 <- read.table("data.txt")
> f1 <- f1[which(is.na(f1)==FALSE),1]
f1 is a data.frame and "[which( ==FALSE), " is same as "[ !is.na() ,
" so could use
f1 <- f1[ !is.na(f1[,1]), 1]1]
>
> x0 <- seq(0,1,0.1)
> x1 <- x0 +0.1
>
> b1 <- c()
> for (a in 1:length(x)){
> b1[[a]] <- f1[which(f1 >= x0[a] & f1 < x1[a])]
> }
That was really not a minimal example, now was it? Used a very small
fraction of your data.
For me this throws an error since x is not defined. Modifying it so x
becomes x0 and adding the column number "1" to f1's indexing gets me
something like what you are describing. It's undoubtedly a case of FAQ
7.31
> b2 <-findInterval(f1[,1], seq(0, 1, by=0.1) )
> str(b2)
int [1:120] 11 10 9 10 10 7 10 9 9 7 ...
> table(b2)
b2
2 3 5 6 7 9 10 11
1 15 17 56 21 5 4 1
> table(f1[,1])
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 4 11 17 18 38 21 5 4 1
Notice that the 0.5 and 0.6es get lumped into the same box. Methods
for discrete variables are more appropriate here. However, if you know
that you numbers are all rounded to the nearest tenth, then add (or
subtract) 0.05 to your boundary criteria so you won't run into
numerical representation problems. (See below. I'm not sure that cut()
will solve your troubles here.)
> table(cut(f1[,], seq(0,1,by=0.1) , include.lowest=TRUE,
right=FALSE ))
[0,0.1) [0.1,0.2) [0.2,0.3) [0.3,0.4) [0.4,0.5) [0.5,0.6) [0.6,0.7)
[0.7,0.8)
0 1 15 0 17 56
21 0
[0.8,0.9) [0.9,1]
5 5
Notice the gap in the 0.4 category. This may be why the S/R designers
chose to make the default for right=TRUE.
>
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list