[R] String Matching
Berend Hasselman
bhh at xs4all.nl
Mon Feb 1 09:00:09 CET 2016
> On 1 Feb 2016, at 08:03, PIKAL Petr <petr.pikal at precheza.cz> wrote:
>
> Hi
>
> Maybe I am completely wrong but do you really need regular expressions?
>
> You say you want to compare first nine characters of id?
>
>> substr(id, 1,9)==cusip
> [1] TRUE
>>
>
> or the last six?
>
>> substr(id, nchar(id)-6, nchar(id))=="432.rds"
> [1] TRUE
>>
>
> Cheers
> Petr
>
>
>> -----Original Message-----
>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Glenn
>> Schultz
>> Sent: Friday, January 29, 2016 6:02 PM
>> To: R Help R
>> Subject: [R] String Matching
>>
>> All,
>>
>> I have a file named as so 313929BL4FNMA2432.rds the user may pass
>> either the first 9 character or the last six characters. I need to
>> match the remainder of the file name using either the first nine or
>> last six. I have read the help files for Regular Expression as used in
>> R and I think what I want to use is glob2rx.
>>
>> I have worked a minimal example to test my code:
>>
>> id <- "313929BL4FNMA2432.rds"
>> cusip <- "313929BL4"
>> poolnumm <- "FNMA2432"
>> paste(cusip, ".*", ".rds")
>> glob2rx(paste(cusip, ".*", ".rds"), trim.head = TRUE, trim.tail = TRUE)
>>
>> This returns false which leads me to believe that it is not working
>> glob2rx(paste(cusip, ".*", ".rds"), trim.head = TRUE, trim.tail = TRUE)
>> == id
>>
>> I am going to use as follows in the function below - which returns the
>> error file not found
>>
>> MBS_Test <- function(MBS.id = "character"){ MBS <-
>> glob2rx(paste(MBS.id, ".*", "//.rds", sep = ""), trim.tail = TRUE)
>> MBS.Conn <- gzfile(description = paste(system.file(package =
>> "BondLab"), "/BondData/", MBS, sep = ""), open = "rb") MBS <-
>> readRDS(MBS.Conn)
>> on.exit(close.connection(MBS.Conn))
>> return(MBS)
>> }
>>
I don't think you are using (glob) wild characters correctly; where you write .* you likely need *?
In addition why not use paste0, which does not use <space> as separator, instead of paste?
Finally your poolnumm variable consists of 8 characters and not 6.
If you change your minimal example to this:
paste0(cusip, "*", ".rds")
glob2rx(paste0(cusip, "*", ".rds"))
grepl(glob2rx(paste0(cusip, "*", ".rds")), id)
grepl(glob2rx(paste0("*", poolnumm, ".rds")), id)
you get TRUE twice.
But Petr's solution for the first 9 characters is much simpler.
And for matching the last 6 (8) you'll have to remove the extension first and then use substr (if I understand your problem correctly).
Berend
More information about the R-help
mailing list