[R] Extract

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Sun Jul 21 02:41:06 CEST 2024


Val:
I wanted to add here a base R solution to your problem that I realize
you can happily ignore. However, in the course of puzzling over how to
do it using the R native pipe syntax ("|>") , I learned some new stuff
that I thought others might find useful, and it seemed sensible to
keep the code with this thread for comparison.

 I want to acknowledge that in the course of my labor, I posted a
query to R-Help to which Iris Simmons posted a very clever answer that
I would never have figured out myself and that is used below at the
end to change a subset of the names of the modified data frame via a
pipe.

Here's the whole solution starting from your (excellent!) example dat:

   dat <- dat$string |>
      strsplit(" ") |>
      sapply(FUN = \(x)c(x, rep(NA, 5 - length(x)))) |>
      t() |> cbind(dat, ..2 = _)

   ## And Iris's trick for changing a subset of attributes, i.e. the
"names", in a pipe
   dat |> names() |> _[4:8] <- paste0("s", 1:5)

## and here's the result:
> dat
  Year Sex          string s1   s2   s3   s4   s5
1 2002   F        15 xc Ab 15   xc   Ab <NA> <NA>
2 2003   F              14 14 <NA> <NA> <NA> <NA>
3 2004   M  18 xb 25 35 21 18   xb   25   35   21
4 2005   M           13 25 13   25 <NA> <NA> <NA>
5 2006   M 14 ac 256 AV 35 14   ac  256   AV   35
6 2007   F              11 11 <NA> <NA> <NA> <NA>

As I noted previously, all columns beyond Sex are character

Cheers,
Bert


On Fri, Jul 19, 2024 at 12:26 PM Val <valkremk using gmail.com> wrote:
>
> Thank you Jeff and Bert for your help!
> The components of the string  could be nixed (i.e,  numeric, character
> or date). Once that is splitted it would be easy for me to format it
> accordingly.
>
> On Fri, Jul 19, 2024 at 2:10 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:
> >
> > I did not look closely at the solutions that you were offered, but
> > note that you did not specify in your post whether the numbers in your
> > string were to be character or numeric variables after they are broken
> > out into their own columns. I believe that they are character in the
> > solutions, but you should check this. If you want them as numeric,
> > e.g., for further processing, you will need to convert them. Or
> > vice-versa.
> >
> > Bert
> >
> >
> > On Fri, Jul 19, 2024 at 9:52 AM Val <valkremk using gmail.com> wrote:
> > >
> > > Hi All,
> > >
> > > I want to extract new variables from a string and add it to the dataframe.
> > > Sample data is csv file.
> > >
> > > dat<-read.csv(text="Year, Sex,string
> > > 2002,F,15 xc Ab
> > > 2003,F,14
> > > 2004,M,18 xb 25 35 21
> > > 2005,M,13 25
> > > 2006,M,14 ac 256 AV 35
> > > 2007,F,11",header=TRUE)
> > >
> > > The string column has  a maximum of five variables. Some rows have all
> > > and others may not have all the five variables. If missing then  fill
> > > it with NA,
> > > Desired result is shown below,
> > >
> > >
> > > Year,Sex,string, S1, S2, S3 S4,S5
> > > 2002,F,15 xc Ab, 15,xc,Ab, NA, NA
> > > 2003,F,14, 14,NA,NA,NA,NA
> > > 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
> > > 2005,M,13 25,13, 25,NA,NA,NA
> > > 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
> > > 2007,F,11, 11,NA,NA,NA,NA
> > >
> > > Any help?
> > > Thank you in advance.
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list