[R] grep(pattern = each element of a vector) ?

Thu Sep 12 21:30:18 CEST 2013

Hi,
res<- ddply(.data=df1,
      .variables='Taxa',
       .fun=transform,
       Class=find.class(Taxa))
#Warning messages:
#1: In grep(x, df2$Taxa) :
 # argument 'pattern' has length > 1 and only the first element will be used
#2: In grep(x, df2$Taxa) :
 # argument 'pattern' has length > 1 and only the first element will be used
#3: In grep(x, df2$Taxa) :
 # argument 'pattern' has length > 1 and only the first element will be used

May be it is better to modify the function:
find.class<- function(x) df2[grep(unique(x),df2$Taxa),'Class']
res1<- ddply(.data=df1,
       .variables='Taxa',
        .fun=transform,
        Class=find.class(Taxa)) #no warnings

#though it doesn't have any effect in the end result.
 identical(res,res1) 
#[1] TRUE

A.K.

----- Original Message -----
From: "Allen, Joel" <Allen.Joel at epa.gov>
To: "Beaulieu, Jake" <Beaulieu.Jake at epa.gov>; "r-help at r-project.org" <r-help at r-project.org>
Cc: "Farrar, David" <Farrar.David at epa.gov>; "Green, Hyatt" <Green.Hyatt at epa.gov>; "McManus, Michael" <McManus.Michael at epa.gov>; "Wahman, David" <Wahman.David at epa.gov>
Sent: Thursday, September 12, 2013 2:49 PM
Subject: Re: [R] grep(pattern = each element of a vector) ?

Jake,
You can use the plyr library or some form of apply.  If you are on a 64bit system you can multithread and it goes much faster.

something like this(for 32bit):
require(plyr)
df1 <- data.frame(Taxa = c('blue', 'red', NA,'blue', 'red', NA,'blue', 'red', NA))
df2 <- data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

#function to do the lookup
find.class<-function(x)df2[grep(x, df2$Taxa),'Class']

ddply(.data=df1,
      .variables='Taxa',
      .fun=transform,
      Class=find.class(Taxa))

Joel

From: Beaulieu, Jake
Sent: Thursday, September 12, 2013 12:06 PM
To: r-help at r-project.org
Cc: Wahman, David; Farrar, David; Allen, Joel; Green, Hyatt; McManus, Michael
Subject: grep(pattern = each element of a vector) ?

Hi,

I have a large dataframe that contains species names.  I have a second dataframe that contains species names and some additional info, called 'Class', about each species.  I would like match the species name is the first data frame with the 'Class' information contained in the second.  Since the species names are often formatted differently between the data sets, merge doesn't work well.  grep does the trick, but the function needs to be called separately for each observation in the first data frame.  I put grep into a loop, but this is too slow.  Is there a way to run grep repeatedly without resorting to a loop?  Possibly something in the apply family?

  df1 <- data.frame(Taxa = c('blue', 'red', NA))
  df2 <- data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

  index <- NULL
  for (i in 1:length(df1$Taxa)) {
    index[i] <- grep(df1$Taxa[1], df2$Taxa)
    }
  index

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)

==================================
Jake J. Beaulieu, PhD
US Environmental Protection Agency
National Risk Management Research Lab
26 W. Martin Luther King Drive
Cincinnati, OH 45268
USA
513-569-7842  (desk)
513-487-2511 (fax)
beaulieu.jake at epa.gov<mailto:beaulieu.jake at epa.gov>

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.