[R] Better use of regex

Thu Sep 15 18:17:25 CEST 2016

I have produced a terribly inefficient piece of codes. In the end, it gives exactly what I need, but clumsily steps through multiple steps which I'm sure could be more efficiently reduced.

Below is a reproducible example. What I have to begin with is character vector, dimInfo. What I want to do is parse this vector 1) find the elements containing 'HS' and 2) grab *only* the first character after the "HS_". The final line of code in the example gives what I need.

Any suggestions on a better approach?

Harold

dimInfo <- c("RecordID", "oppID", "position", "key", "operational", "IsSelected", 
"score", "item_1_HS_conv_ovrl_scr", "item_1_HS_elab_ovrl_scr", 
"item_1_HS_org_ovrl_scr")

ff <- dimInfo[grep('HS', dimInfo)]
gg <- strsplit(ff, 'HS_')
hh <- sapply(1:3, function(i) gg[[i]][2])
substr(hh, 1, 1)