Problems understanding use of regular expression (in gsub) for manipulating currency

David Winsemius dwinsemius at comcast.net
Thu Nov 22 02:36:13 CET 2012

On Nov 21, 2012, at 1:41 PM, Mauricio Cornejo wrote:

> Hello,
> After reading help file, various threads on this board, and other online tutorials, I've attempted to use gsub (using Perl-like syntax) to change a currency string into something that can be converted to numeric type using only one regular expression.  Can anybody point out my error?  Note that 
>>   x <- "\"$ 1,200,300,400.50\""
> Tried the following in an attempt to arrive at "1200300400.50"
>>   gsub("(^[\\D]*)(([\\d]*)[,])*([\\d]*[.]*[\\d]*)([\\D]*)", "\\3\\4", x, perl=TRUE)
> [1]  "300400.50"
> Note that "\d" matches a digit character and "\D" matches a non-digit character.
> Results group "\2" was intentionally omitted from the replacement pattern as it would have included commas.

> gsub("[,\"]", "", gsub("^\\D*(\\d.*)", "\\1",x, perl=TRUE) )
[1] "1200300400.50"

I have my doubts about the "\"..." construction. I suspect it stems from your not understanding the conventaion used in printing escpae characters in R.


David Winsemius, MD
Alameda, CA, USA

