[R] recode according to specific sequence of characters within a string variable

Greg Snow Greg.Snow at imail.org
Fri Feb 4 18:47:45 CET 2011


So you want to combine multiple columns back into a single column with the strings pasted together?  If that is correct then look at the paste and sprintf functions (use one or the other, not both).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Denis Kazakiewicz
> Sent: Friday, February 04, 2011 6:26 AM
> To: Marc Schwartz
> Cc: R-help
> Subject: Re: [R] recode according to specific sequence of characters
> within a string variable
> 
> Dear R people
> Could you please help
> I have similar but opposite question
> How to reshape data from DF.new  to  DF from example, Mark kindly
> provided?
> 
> Thank you
> Denis
> 
> On Пят, 2011-02-04 at 07:09 -0600, Marc Schwartz wrote:
> > On Feb 4, 2011, at 6:32 AM, D. Alain wrote:
> >
> > > Dear R-List,
> > >
> > > I have a dataframe with one column "name.of.report" containing
> character values, e.g.
> > >
> > >
> > >> df$name.of.report
> > >
> > > "jeff_2001_teamx"
> > > "teamy_jeff_2002"
> > > "robert_2002_teamz"
> > > "mary_2002_teamz"
> > > "2003_mary_teamy"
> > > ...
> > > (i.e. the bit of interest is not always at same position)
> > >
> > > Now I want to recode the column "name.of.report" into the variables
> "person", "year","team", like this
> > >
> > >> new.df
> > >
> > > "person"  "year"  "team"
> > > jeff           2001      x
> > > jeff           2002      y
> > > robert       2002      z
> > > mary        2002      z
> > >
> > > I tried with grep()
> > >
> > > df$person<-grep("jeff",df$name.of.report)
> > >
> > > but of course it didn't exactly result in what I wanted to do.
> Could not find any solution via RSeek. Excuse me if it is a very silly
> question, but can anyone help me find a way out of this?
> > >
> > > Thanks a lot
> > >
> > > Alain
> >
> >
> > There will be several approaches, all largely involving the use of
> ?regex. Here is one:
> >
> >
> > DF <- data.frame(name.of.report = c("jeff_2001_teamx",
> "teamy_jeff_2002",
> >                                     "robert_2002_teamz",
> "mary_2002_teamz",
> >                                     "2003_mary_teamy"))
> >
> > > DF
> >      name.of.report
> > 1   jeff_2001_teamx
> > 2   teamy_jeff_2002
> > 3 robert_2002_teamz
> > 4   mary_2002_teamz
> > 5   2003_mary_teamy
> >
> >
> > DF.new <- data.frame(person = gsub("[_0-9]|team.", "",
> DF$name.of.report),
> >                      year = gsub(".*([0-9]{4}).*","\\1",
> DF$name.of.report),
> >                      team = gsub(".*team(.).*","\\1",
> DF$name.of.report))
> >
> >
> > > DF.new
> >   person year team
> > 1   jeff 2001    x
> > 2   jeff 2002    y
> > 3 robert 2002    z
> > 4   mary 2002    z
> > 5   mary 2003    y
> >
> >
> >
> > HTH,
> >
> > Marc Schwartz
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list