[R] Drop column from a data frame

David Winsemius dwinsemius at comcast.net
Mon Dec 27 16:04:30 CET 2010


On Dec 26, 2010, at 8:22 PM, John Sorkin wrote:

> I am trying to drop a column of a data frame. The code below  
> attempts to drop a numeric column (which does not work but gives no  
> error or warning) and a factor column (which does not work but gives  
> an error).
> I would appreciate someone telling me why my code does not work, and  
> suggesting code that will work.

You are misusing the syntax of the "[" operation. When using negative  
indices you can only use numeric or logical values :

?"["

Character indices always need to be "positive".

dfxyz[ , -2]  # works
dfxyz[ , c(T,F,T)] # works

 > dfxyz[ , -"y"]
Error in -"y" : invalid argument to unary operator

This next mechanism also works and us especially useful on dataframes  
with lots of columns:

dfxyz[ , -grep("y", names(dfxyz))]

But you need to be careful to make sure you know which columns will  
match and its good practice to test the grepping expression first:
 > grep("y", names(dfxyz))
[1] 2

If you only wanted to remove "y" and not "y2" you would need to add  
qualifiers to the pattern.

> Thanks,
> John
>
> rm(dfxyz,dfxz,dfxy)
>
> # create the data frame.
> dfxyz <- data.frame(x=1:10,y=11:20,z=factor(c(rep(0,5),rep(1,5))))
> dfxyz
>
> names(dfxyz)
>
> # try to drop y column
> # does not work, does not produce error message
> dfxz <- dfxyz[,-(dfxyz$y)]

Well, dfxyz$y does evaluate to a numeric vector with values 11:20 and  
there were no columns in that range. So it behaved as documented. You  
asked for the dataframe without some non-existent (numbered) columns  
and it obliged.

> dfxz
>
> # try to drop z column
> # does not work, produces error message:
> # In Ops.factor(df$z) : - not meaningful for factors
> dfxy <- dfxyz[,-dfxyz$z]

Right, you cannot subtract (or negate) factors.

As Phil suggests, subset()-ting is often safer.

-- 
David.



More information about the R-help mailing list