[R] Omitting repeated occurrence in a string
David Winsemius
dwinsemius at comcast.net
Wed Feb 6 20:59:27 CET 2013
On Feb 6, 2013, at 11:24 AM, David Winsemius wrote:
>
> On Feb 6, 2013, at 8:46 AM, Christofer Bogaso wrote:
>
>> Hello again,
>>
>> I was looking for some way on How to delete repeated appearance in a
>> String. Let say I have following string:
>>
>> Text <- "ahsgdvasgAbcabcsdahj"
>>
>> Here you see "Abc" appears twice. But I want to keep only 1
>> occurrence. Therefore I need that:
>>
>> Text_result <- "ahsgdvasgAbcsdahj" (i.e. the first one).
>>
>> Can somebody help me if it is possible using some R function?
>
> This is not going to solve all possible variations of this problem, but then you proposed testing suite was rather limited, ... don't you agree?
>
>> Text <- "ahsgdvasgAbcabcsdahabcj"
>> gsub("(abc).*(abc)", "\\1", Text, ignore.case=TRUE)
> [1] "ahsgdvasgAbcj"
>
This gives some further variations:
> Text <- "ahsgdvasgAbcabcsdahabcj" #adding a third instance
> gsub("(abc).*(abc)", "\\1", Text, ignore.case=TRUE)
[1] "ahsgdvasgAbcj"
# The first strategy deletes everything up to and through the last 'abc'
> gsub("(abc)((.*)(abc))", "\\1\\2", Text, ignore.case=TRUE)
[1] "ahsgdvasgAbcabcsdahabcj"
# embedded parenthesies don't seem to "work"
> gsub("(abc)(abc)", "\\1", Text, ignore.case=TRUE)
[1] "ahsgdvasgAbcsdahabcj"
Gets rid of first of sequential instances only.
> Text
[1] "ahsgdvasgAbcabcsdahabcj"
> gsub("(abc)(.?)(abc)", "\\1\\2", Text, ignore.case=TRUE)
[1] "ahsgdvasgAbcsdahabcj"
# Only gets rid of first repeat
>
#This gets rid of all of sequential repeats but not separated ones
> Text <- "ahsgdvasgAbcabcabcabcabcsdahabcj"
> gsub("(abc)(abc)*", "\\1", Text, ignore.case=TRUE)
[1] "ahsgdvasgAbcsdahabcj"
>
>
> --
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list