[R] regex not working for some entries in for loop
Omar André Gonzáles Díaz
oma.gonzales at gmail.com
Sun Nov 8 05:42:58 CET 2015
Thanks S. Ellison.
Finally, Ihad some time to test it. Thanks for your clarification.
Just one more question:
You say:
Your regexes are on multiple lines and include whitespace and linefeeds.
For example you are not testing for
" .*forum.*|.*buy.*"; you are testing for
" .*forum.*|
.*buy.*"
But, the ".*", as far as I understand, means: any character, 0 or more
times. So I should cover the blank and break lines. May you explain this
further, this is not making click on my head.
2015-10-26 7:29 GMT-05:00 S Ellison <S.Ellison at lgcgroup.com>:
>
>
> > From: Omar André Gonzáles Díaz
> > Subject: [R] regex not working for some entries in for loop
> >
> > I'm using some regex in a for loop to check for some values in column
> "source",
> > and put a result in column "fuente".
>
> Your regexes are on multiple lines and include whitespace and linefeeds.
> For example you are not testing for
> " .*forum.*|.*buy.*"; you are testing for
> " .*forum.*|
> .*buy.*"
> (which among other things includes a \n)
> Don’t do that. Keep it to one line with no white space.
> if you must have line breaks in the code, form the pattern using paste, as
> in
> pat1 <- paste(c("site.*", ".*event.*", ".*free.*", ".*theguardlan.*",
> ".*guardlink.*", ".*torture.*", ".*forum.*", ".*buy.*",
> ".*share.*", ".*buttons.*", ".*pyme\\.lavoztx\\.com\\.*",
> ".*amezon.*", "computrabajo.com.pe", ".*porn.*", "quality"),
> collapse="|")
>
> spam <- grepl(pat1, sf$source,ignore.case = T)
>
> Also, it's not immediately clear why you’re looping. grepl returns a
> vector of logicals; you have a vector of character strings. Consider
> replacing 'if' constructs with 'ifelse' - albeit a complicated ifelse() -
> and doing the whole thing without a loop.
>
> S Ellison
>
>
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:17}}
More information about the R-help
mailing list