[R] subset data.frame with value != in all columns
Marc Schwartz
MSchwartz at MedAnalytics.com
Thu Feb 3 21:24:18 CET 2005
On Thu, 2005-02-03 at 14:57 -0500, Tim Howard wrote:
> I am trying to extract rows from a data.frame based on the
> presence/absence of a single value in any column. I've figured out how
> to do get the positive matches, but the remainder (rows without this
> value) eludes me. Mining the help pages and archives brought me,
> frustratingly, very close, as you'll see below.
>
> My goal: two data frames, one with -99 in at least one column in each
> row, one with no occurrences of -99. I want to preserve rownames in
> each.
>
> My questions:
> Is there a cleaner way to extract all rows containing a specified
> value?
> How can I extract all rows that don't have this value in any col?
>
> #create dummy dataset
> x <- data.frame(
> c1=c(-99,-99,-99,4:10),
> c2=1:10,
> c3=c(1:3,-99,5:10),
> c4=c(10:1),
> c5=c(1:9,-99))
>
> #extract data.frame of rows with -99 in them
> for(i in 1:ncol(x))
> {
> y<-subset(x, x[,i]==-99, drop=FALSE);
> ifelse(i==1, z<-y, z <- rbind(z,y));
> }
>
> #various attempts to get rows not containing "-99":
>
> # this attempt was to create, in "list", the exclusion formula for each
> column.
> # Here, I couldn't get subset to recognize "list" as the correct type.
> # e.g. it works if I paste the value of list in the subset command
> {
> for(i in 1:ncol(x)){
> if(i==1)
> list<-paste("x[",i,"]!=-99", sep="")
> else
> list<-paste(list," ", " & x[",i,"]!=-99", sep="")
> }
> y<-subset(x, list, drop=FALSE);
> }
>
> # this will do it for one col, but if I index more
> # it returns all rows
> y <- x[!(x[,3] %in% -99),]
>
> # this also works for one col
> y<-x[x[,1]!=-99,]
>
> # but if I index more, I get extra rows of NAs
> y<-x[x[,1:5]!=-99,]
>
> Thanks in advance.
> Tim Howard
How about this, presuming that your data frame is all numeric:
For rows containing -99:
> x[unique(which(x == -99, arr.ind = TRUE)[, 1]), ]
c1 c2 c3 c4 c5
1 -99 1 1 10 1
2 -99 2 2 9 2
3 -99 3 3 8 3
4 4 4 -99 7 4
10 10 10 10 1 -99
For rows not containing -99:
> x[-unique(which(x == -99, arr.ind = TRUE)[, 1]), ]
c1 c2 c3 c4 c5
5 5 5 5 6 5
6 6 6 6 5 6
7 7 7 7 4 7
8 8 8 8 3 8
9 9 9 9 2 9
What I have done here is to use which(), setting arr.ind = TRUE. This
returns the row, column indices for the matches to the boolean
statement. The first column returned by which() in this case are the row
numbers matching the statement, so I take the first column only.
Since it is possible that more than one element in a row can match the
boolean, I then use unique() to get the singular row values.
Thus, I can use the returned row indices above to subset the data frame.
HTH,
Marc Schwartz
More information about the R-help
mailing list