[R] Split String in regex while Keeping Delimiter

Eric Berger er|cjberger @end|ng |rom gm@||@com
Wed Apr 12 19:32:53 CEST 2023


This seems to do the job but there are probably more elegant solutions:

f <- function(s) { sub("^ ","",unlist(strsplit(gsub("\\+ ","+@ ",s),"@"))) }
g <- function(s) { sub("^ ","",unlist(strsplit(gsub("- ","-@ ",s),"@"))) }
h <- function(s) { g(f(s)) }

To try it out:
s <- “leucocyten + gramnegatieve staven +++ grampositieve staven ++”
t <- “leucocyten – grampositieve coccen +”

h(s)
h(t)

HTH,
Eric


On Wed, Apr 12, 2023 at 7:56 PM Emily Bakker <emilybakker using outlook.com>
wrote:

> Hello List,
>
> I have a dataset consisting of strings that I want to split while saving
> the delimiter.
>
> Some example data:
> “leucocyten + gramnegatieve staven +++ grampositieve staven ++”
> “leucocyten – grampositieve coccen +”
>
> I want to split the strings such that I get the following result:
> c(“leucocyten +”,  “gramnegatieve staven +++”,  “grampositieve staven ++”)
> c(“leucocyten –“, “grampositieve coccen +”)
>
> I have tried strsplit with a regular expression with a positive lookahead,
> but I am not able to achieve the results that I want.
>
> I have tried:
> as.list(strsplit(x, split = “(?=[\\+-]{1,3}\\s)+, perl=TRUE)
>
> Which results in:
> c(“leucocyten “, “+”,  “gramnegatieve staven “, “+”, “+”, “+”,
>  “grampositieve staven ++”)
> c(“leucocyten “, “–“, “grampositieve coccen +”)
>
>
> Is there a function or regular expression that will make this possible?
>
> Kind regards,
> Emily
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list