[R] how to Subset based on partial matching of columns?

Sarah Goslee sarah.goslee at gmail.com
Thu Apr 9 15:24:04 CEST 2015


Hi,

Please don't put quotes around your code. It makes it hard to copy and
paste. Alternatively, don't post in HTML, because it screws up your
code.

On Wed, Apr 8, 2015 at 8:57 PM, samarvir singh <samarvir1996 at gmail.com> wrote:
> So I have a list that contains certain characters as shown below
>
> `list <- c("MY","GM+" ,"TY","RS","LG")`

That's a character vector, not a list. A list is a specific type of object in R.

> And I have a variable named "CODE" in the data frame as follows
>
> `code <- c("MY GM+", ,"LGTY", "RS","TY")`

That doesn't work, and I have no idea what you expect to have there,
so I'm deleting the extra comma. Also, your vector is named code, not
CODE.

code <- c("MY GM+", "LGTY", "RS","TY")
x <- c(1:4)

> 'x <- c(1:5)
> `df <- data.frame(x,code)`

You problably actually want
mydf <- data.frame(x, code, stringsAsFactors=FALSE)

Note I changed the name, because df() is a base R function.


> Now I want to create 5 new variables named "MY","GM+","TY","RS","LG"
>
> Which takes binary value, 1 if there's a match case in the CODE variable
>
>     df
>      x  code         MY GM+ TY RS LG
>     1  MY GM+  1     1      0    0   0
>     2                  0     0      0    0   0
>     3  LGTY       0     0     1     0   1
>     4  RS           0     0      0    1    0
>     5  TY           0     0      1    0    0

grepl() will give you a logical match

data.frame(mydf, sapply(code, function(x)grepl(x, mydf$code)),
stringsAsFactors=FALSE, check.names=FALSE)

Sarah


-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list