[R] Maximum number of patterns and speed in grep

mdvaan mathijsdevaan at gmail.com
Fri Jul 13 19:41:12 CEST 2012


Here's some data (which should give you the error messages):

    # read in data
    data <- read.csv("https://dl.dropbox.com/u/13631687/data.csv", header =
T, sep = ",")
    
    # first paste all data
    data1 <- paste(data[,1], collapse = "|")
    
    # second paste subsets of the data
    data2a <- paste(data[1:750,1], collapse = "|")
    data2b <- paste(data[751:1500,1], collapse = "|")
    
    # define the object to be searched
    text <- c("the first is Santa Fe Gold Corp", "the second is Starpharma
Holdings")
    
    # match
    strapplyc(text, data1)
    strapplyc(text, data2a)
    strapplyc(text, data2b)

Thanks in advance!

Math



Gabor Grothendieck wrote
> 
> On Fri, Jul 13, 2012 at 9:40 AM, mdvaan <mathijsdevaan@> wrote:
>> Thanks, I see that it is working in the sample data. My data, however,
>> gives
>> me an error message:
>>
>> data <- strapplyc(text, batch[[l]])
>> Error in structure(.External("dotTcl", ..., PACKAGE = "tcltk"), class =
>> "tclObj") :
>>   [tcl] couldn't compile regular expression pattern: parentheses () not
>> balanced.
>>
>> batch[[l]] is similar to your "re" string except that there is a larger
>> variety of characters. I haven't been able to figure out which characters
>> are causing trouble here. Any thoughts?
>>
>> Thank you very much.
>>
>> Math
> ...
>>
>> ______________________________________________
>> R-help@ mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> Note part on last line about posting reproducible code.
> 
> -- 
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> 
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

--
View this message in context: http://r.789695.n4.nabble.com/Maximum-number-of-patterns-and-speed-in-grep-tp4635613p4636472.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list