[R] Replace Text but not from within a word

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Tue Feb 28 15:36:59 CET 2017


For tasks like this, you will probably want to make sure to import the data as character data rather than as a factor.  E.g.

dat <- read.csv( "myfile.csv", header=FALSE, as.is=TRUE )

You can check what you have with the str() function.
-- 
Sent from my phone. Please excuse my brevity.

On February 28, 2017 5:19:40 AM PST, Marc Schwartz <marc_schwartz at me.com> wrote:
>
>> On Feb 28, 2017, at 3:38 AM, Harshal Athawale
><pgcim15.harshal at spjimr.org> wrote:
>> 
>> I am new in R.
>> 
>> I have a file. This file contains name of the companies.
>> 'data.frame': 494 obs. of  1 variable:
>> $ V1: Factor w/ 470 levels "3-d engineering corp",..: 293 134 339 359
>143
>> 399 122 447 398 384 ...
>> 
>> Problem: I would like to remove "CO" (As it is the most frequent
>word). I
>> would like "CO" to removed from BOEING CO --> BOEING but not from
>SAGINAW
>> *CO*UNTY INC*. *
>> 
>>> text = c("BOEING CO","ENGMANTAYLOR CO","SAGINAW COUNTY INC")
>> 
>>> gsub(x = text, pattern = "CO", replacement = "")
>> 
>> [1] "BOEING "       "ENGMANTAYLOR " "SAGINAW UNTY"
>> 
>> Thanks in advance.
>> 
>> - Sam
>
>
>Hi,
>
>See ?regex and ?grep for some details and examples on how to construct
>the expression used for matching, as well as some of the references
>therein.
>
>In this case, you want to use something along the lines of:
>
>> gsub(" CO$", "", text)
>[1] "BOEING"             "ENGMANTAYLOR"       "SAGINAW COUNTY INC"
>
>where the "CO" is preceded by a space and followed by the "$", which is
>a special character that indicates the end of the string to be matched.
>
>Regards,
>
>Marc Schwartz
>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list