[R] pasteFromExcel

Dan Murphy chiefmurphy at gmail.com
Mon Oct 20 23:34:57 CEST 2014


Good ideas, David.

1) By "confirm that MS Excel honors that OutDec" I mean that, in a
location (France? others?) where options("OutDec") is a comma, does MS
Excel format numbers that way when displaying currencies with decimal
places? I have no way of knowing if that is true in all OutDec = ","
locales.

2) I wish it were as simple as just removing unwanted "adornments."
The issue is that such "adornments" must be in their proper places for
the character string to represent a currency value, or a numeric value
for that matter. If I add one more comma to your first element in the
wrong place, it should not translate to a valid numeric, but it does
with your gsub, which would be a bug if that were in pasteFromExcel:
> gsub(rmchar, "", c("$1,0,00", "1,200", "800"))
[1] "1000" "1200" "800"

When I originally looked into this I believed I couldn't be the first
one asking that question .. and I wasn't. There are many hits for
regular expressions that purport to successfully identify well-formed
*US dollar* currency strings. The expression in pasteFromExcel is
based on http://stackoverflow.com/questions/354044/what-is-the-best-u-s-currency-regex.

I'm curious if anyone has come across -- and tested -- a similar
regular expression in other places that might have use for
pasteFromExcel.

This is how pasteFromExcel uses its currency regular expression (the
first ugly assignment is what I'm looking for in other locales around
the world -- maybe there's a Regular Expression mailing list out
there):

currencypattern <-
"^\\$?\\-?([1-9]{1}[0-9]{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))$|^\\-?\\$?([1-9]{1}\\d{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))$|^\\$?\\(([1-9]{1}\\d{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))\\)$"

# Here's a test vector
x <- c("1,234.00", "12,34.00", "$1,000", "(124)", "$(123)", "($123)",
"  1,000   ", "NA")

# grep will tell you whether elements of x, trimmed of
beginning/ending whitespace, match the currencypattern
grep(currencypattern, trim(x))
[1] 1 3 4 5 7  # correct answer

*Now* one may remove unwanted characters from the well-formed strings.
And deal with the "negatives" of course .. and NAs. See how that's
done in excelRio.r in the excelRio package on github:
https://github.com/trinostics/excelRio

Thanks for your interest.


On Mon, Oct 20, 2014 at 10:56 AM, David Winsemius
<dwinsemius at comcast.net> wrote:
>
> On Oct 20, 2014, at 10:29 AM, Dan Murphy wrote:
>
>> Nice.
>> So if someone were to offer a currency regular expression that works
>> in their locale, I should also ask them to give me the results of
>> Sys.getlocale("LC_MONETARY")
>> and
>> options("OutDec")
>> and confirm that MS Excel honors that OutDec.
>
> I'm not sure we can know what you mean by "confirm that MS Excel honors that OutDec." The result of options("OutDec") was intended for you to determine what character not to remove from a monetary value in an R workspace. If the assumption is that all values will be in the same unit and that the user is not doing any currency conversions then:
>
>>  decsep <- options("OutDec")
>> rmchar <- paste0( "[$£€", c(".", ",")[!c(".", ",") %in% decsep], "]" )
>> gsub(rmchar, "", c("$1,000", "1,200", "800"))
> [1] "1000" "1200" "800"
>
>
>> Thank you, David.
>> -Dan
>>
>> On Mon, Oct 20, 2014 at 10:04 AM, David Winsemius
>> <dwinsemius at comcast.net> wrote:
>>>
>>> On Oct 19, 2014, at 11:18 PM, Dan Murphy wrote:
>>>
>>>> To Users of Excel:
>>>>
>>>> Following advice from Brian and Markus, I created an RMarkdown "vignette"
>>>> that shows an example of how the pasteFromExcel function in the excelRio
>>>> package on github could be used by an actuary to transfer a triangle from
>>>> Excel to R. See today's post at http://trinostics.blogspot.com/
>>>>
>>>> Unfortunately, if you are located outside the US, the demonstrated
>>>> functionality will not work for you because the currency regex implemented
>>>> assumes the dollar sign ($) and comma/decimal punctuation of the form
>>>> 999,999.00.
>>>>
>>>> If anyone is interested in contributing currency regex expressions that
>>>> work in your locale, I would be happy to try to incorporate them in the
>>>> package. If anyone knows how best to determine the user's locale (might
>>>> "timezone" suffice?), I'd appreciate that help too.
>>>>
>>>
>>> ?Sys.getlocale   # perhaps "LC_MONETARY"
>>>
>>> ?options   # look for OutDec
>>>
>>>
>>>>      [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>
> David Winsemius
> Alameda, CA, USA
>



More information about the R-help mailing list