[R] Fw: Regex problem
David Winsemius
dwinsemius at comcast.net
Thu Jan 5 20:12:32 CET 2017
> On Jan 5, 2017, at 10:09 AM, Carl Sutton via R-help <r-help at r-project.org> wrote:
>
> Re-sending help request, went to wrong addy first time.
> r-help-request at r-project.org
>
> Belated Happy new year to the Guru's:
>
> I have a data frame with 570+ columns and in those column headers yours truly has a few blunders. Namely somehow I managed to end some of them with both an apostrophe ' and an ending quote.
Doubtful. You probably only have a single apostrophe and no "ending quote". In fact when I run your `problemdf`, the `make.names` function (called by data.frame) changed the apostrophe into a period. To actually get a trailing apostrophe with `data.frame` you would need to set check.names=FALSE:
df1 <- data.frame("WhatAmI\'" = 1:5, "WhoAreYou" = 11:15, check.names=FALSE)
colnames(df1)
#[1] "WhatAmI'" "WhoAreYou"
There is no double quote in that name. Now to remove the offending apostrophe (or even multiple instances of them) just do this:
names(df) <- gsub( "\\'", "", names(df)
> I think the attached code finds the occurrences (not 100% sure) and feedback is appreciated. This is my first attempt at regex and I have been googling and reading the last few days (including an R -Exercise).
>
> Confused as to why the column names shows a "." instead of a " ' ".
See above.
>
> Ignorant of why gregexpr and regexpr show attr(,"useBytes") as TRUE when the default is FALSE. Is it possible I somehow messed them up last week? Simply typing the function name in the console shows the defaults as FALSE.
>
> I have not been able to build a construct to simply delete the apostrophe. I have made several attempts to do this, and left one for your perusal. The others were just to "off the wall" and embarrassing.
>
> Lastly, is there a way for me to check that all of my column names end with a letter followed by a quote? I am thinking something along the lines of "[[:alpha:]\\"" but I expect that will throw an error. I stumbled upon the ' " problem when dplyr complained about it last week, and it is unsettling to think I may have more goofs.
>
> Any suggestions of a good reference book is much appreciated. I can see extended use of regex coming toward me and I am so ignorant it is frightening (all volunteer work, no $'s involved, but I dislike being incompetent).
I learned regex by reading the ?regex page, and by looking up and working through questions on R-help by Gabor Grothendeick:
http://markmail.org/search/?q=list%3Aorg.r-project.r-help+regex#query:list%3Aorg.r-project.r-help%20regex%20from%3A%22Gabor%20Grothendieck%22+page:1+state:facets
There are also several online sites where you can get an expression by expression readout of what your regexes are doing. They do need the understanding that hte escape character for R and regex are the same and the means they need to be doubled in hte pattern arguments (but _not_ the replacement arguments).
--
David.
>
>
> # regex problemdf1 <- data.frame("WhatAmI'" = 1:5, "WhoAreYou" = 11:15)
> colnames(df1)
> df1
> ma_pattern <- "[[:punct:]][[:punct:]]" # Need single ][ in the middle??
> grep(ma_pattern,colnames(df1))
> ma_pattern <- "[[:punct:][:punct:]]" # single ][ worked
> grep(ma_pattern,colnames(df1),value = TRUE) # found it
> grepl(ma_pattern,colnames(df1))
> gregexpr(ma_pattern,colnames(df1)) # at position 8
> regexpr(ma_pattern,colnames(df1))
>
> #sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
> # fixed = FALSE, useBytes = FALSE)
>
> #sub(ma_pattern,replacement = "'\\"",df1)
> colnames(df1)
>
> Carl Sutton
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list