[R] Substring replacement in string
Hervé Pagès
hpages at fredhutch.org
Sun Mar 1 10:38:25 CET 2015
Hi Alrik,
On 02/28/2015 11:06 PM, Alrik Thiem wrote:
> Dear Hervé,
>
> Many thanks for your suggestion. Gabor Grothendieck proposed a simple
> one-liner that works perfectly for my purposes:
>
> gsub("(\\b[a-oq-z][a-z0-9]*)", "1-\\U\\1", x, perl = TRUE)
>
> where x is the respective string.
Sounds good. I didn't realize that you also wanted to prefix the lower
case letters with "1 - " so my solution was not doing the right thing
anyway. Here is the corrected solution, just in case:
library(Biostrings)
funnyReplace <- function(x, protected_words)
{
x <- BString(x)
## Extract the substrings to modify (target substrings).
protected_regions <- reduce(do.call("c", lapply(protected_words,
matchPattern, x)))
target_regions <- ranges(gaps(protected_regions))
target_substrings <- extractAt(x, target_regions)
## Modify them (using a reg exp almost like Gabbor's except
## that "p" is not treated as an exception).
target_substrings <- gsub("(\\b[a-z][a-z0-9]*)", "1 - \\U\\1",
target_substrings, perl=TRUE)
## Replace in original string.
x <- replaceAt(x, target_regions, target_substrings)
as.character(x)
}
Then:
> x <- "pmin(pmax(pmin(x1, X2), pmin(X3, X4)) == Y, pmax(Z1, z1))"
> funnyReplace(x, c("pmin", "pmax"))
[1] "pmin(pmax(pmin(1 - X1, X2), pmin(X3, X4)) == Y, pmax(Z1, 1 - Z1))"
It works even if a variable name starts with a "p":
> funnyReplace("pmin(p)", c("pmin", "pmax"))
[1] "pmin(1 - P)"
and you can specify an arbitrary number of protected words.
Cheers,
H.
>
> Best wishes,
> Alrik
>
> -----Ursprüngliche Nachricht-----
> Von: Hervé Pagès [mailto:hpages at fredhutch.org]
> Gesendet: Samstag, 28. Februar 2015 23:29
> An: Alrik Thiem; r-help at r-project.org
> Betreff: Re: [R] Substring replacement in string
>
> Hi Alrik,
>
> With the Biostrings/IRanges infrastructure (Bioconductor packages), you
> can do this with:
>
> library(Biostrings)
> x0 <- BString("pmin(pmax(pmin(x1, X2), pmin(X3, X4)) == Y, pmax(Z1,
> z1))")
> donttouch_words <- c("pmin", "pmax")
>
> ## Extract the substrings to modify (target substrings).
> donttouch_regions <- reduce(do.call("c", lapply(donttouch_words,
> matchPattern, x0)))
> target_regions <- ranges(gaps(donttouch_regions))
> target_substrings <- extractAt(x0, target_regions)
>
> ## Modify them.
> old <- paste0(letters, collapse="")
> new <- paste0(LETTERS, collapse="")
> target_substrings <- chartr(old, new, target_substrings)
>
> ## Replace in original string.
> x1 <- replaceAt(x0, target_regions, target_substrings)
>
> Then:
>
> > x1
> 57-letter "BString" instance
> seq: pmin(pmax(pmin(X1, X2), pmin(X3, X4)) == Y, pmax(Z1, Z1))
>
> > as.character(x1)
> [1] "pmin(pmax(pmin(X1, X2), pmin(X3, X4)) == Y, pmax(Z1, Z1))"
>
> Hope this helps,
> H.
>
> On 02/27/2015 02:19 PM, Alrik Thiem wrote:
>> Dear R-help list,
>>
>> I would like to replace all lower-case letters in a string that are not
> part
>> of certain fixed expressions. For example, I have the string:
>>
>> "pmin(pmax(pmin(x1, X2), pmin(X3, X4)) == Y, pmax(Z1, z1))"
>>
>> Where I would like to replace all lower-case letters that do not belong to
>> the functions "pmin" and "pmax" by 1 - toupper(...) to get
>>
>> "pmin(pmax(pmin(1 - X1, X2), pmin(X3, X4)) == Y, pmax(Z1, 1 - Z1))"
>>
>> Any ideas on how I could achieve that?
>>
>> Many thanks and best wishes,
>>
>> Alrik
>>
>>
>> ********************************
>> Alrik Thiem
>> Post-Doctoral Researcher
>>
>> Department of Philosophy
>> University of Geneva
>> Rue de Candolle 2
>> CH-1211 Geneva
>>
>> +41 76 527 80 83
>>
>> http://www.alrik-thiem.net
>> http://www.compasss.org
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the R-help
mailing list