[R] Symbol/String comparison in R

Kristjan Kure kr|@tj@n@kure@1 @end|ng |rom gm@||@com
Thu Apr 14 17:33:58 CEST 2022


Hi Rui

Thank you for the code snippet.

1) How do you find your "Portuguese_Portugal.1252" symbols table now?
Is it this https://en.wikipedia.org/wiki/Windows-1252?

2) What attributes and values do you check to validate the end result?
I see there is a section "Codepage layout" and I can find "A" and "a"
symbols.

What values on that table tell you "A" is bigger than "a"?
"A" < "a" # returns FALSE
"A" > "a" # returns TRUE

PS! My locale is Estonian_Estonia.1257

Regards,
Kristjan

On Thu, Apr 14, 2022 at 5:05 PM Rui Barradas <ruipbarradas using sapo.pt> wrote:

> Hello,
>
> This is a locale issue, you are counting on the ASCII table codes but
> that's only valid for the "C" locale.
>
> old_loc <- Sys.getlocale("LC_COLLATE")
>
> "A" < "a"
> #> [1] FALSE
> "A" > "a"
> #> [1] TRUE
>
> Sys.setlocale("LC_COLLATE", locale = "C")
> #> [1] "C"
>
> "A" < "a"
> #> [1] TRUE
> "A" > "a"
> #> [1] FALSE
>
> Sys.setlocale("LC_COLLATE", old_loc)
> #> [1] "Portuguese_Portugal.1252"
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 15:06 de 13/04/2022, Kristjan Kure escreveu:
> > Hi!
> >
> > Sorry, I am a beginner in R.
> >
> > I was not able to find answers to my questions (tried Google, Stack
> > Overflow, etc). Please correct me if anything is wrong here.
> >
> > When comparing symbols/strings in R - raw numeric values are compared
> > symbol by symbol starting from left? If raw numeric values are not used
> is
> > there an ASCII / Unicode table where symbols have values/ranking/order
> and
> > R compares those values?
> >
> > *2) Comparing symbols*
> > Letter "a" raw value is 61, letter "b" raw value is 62? Is this correct?
> >
> > # Raw value for "a" = 61
> > a_raw <- charToRaw("a")
> > a_raw
> >
> > # Raw value for "b" = 62
> > b_raw <- charToRaw("b")
> > b_raw
> >
> > # equals TRUE
> > "a" < "b"
> >
> > Ok, so 61 is less than 62 so it's TRUE. Is this correct?
> >
> > *3) Comparing strings #1*
> > "1040" <= "12000"
> >
> > raw_1040 <- charToRaw("1040")
> > raw_1040
> > #31 *30* (comparison happens with the second symbol) 34 30
> >
> > raw_12000 <- charToRaw("12000")
> > raw_12000
> > #31 *32* (comparison happens with the second symbol) 30 30 30
> >
> > The symbol in the second position is 30 and it's less than 32. Equals to
> > true. Is this correct?
> >
> > *4) Comparing strings #2*
> > "1040" <= "10000"
> >
> > raw_1040 <- charToRaw("1040")
> > raw_1040
> > #31 30 *34*  (comparison happens with third symbol) 30
> >
> > raw_10000 <- charToRaw("10000")
> > raw_10000
> > #31 30 *30*  (comparison happens with third symbol) 30 30
> >
> > The symbol in the third position is 34 is greater than 30. Equals to
> false.
> > Is this correct?
> >
> > *5) Problem - Why does this equal FALSE?*
> > *"A" < "a"*
> >
> > 41 < 61 # FALSE?
> >
> > # Raw value for "A" = 41
> > A_raw <- charToRaw("A")
> > A_raw
> >
> > # Raw value for "a" = 61
> > a_raw <- charToRaw("a")
> > a_raw
> >
> > Why is capitalized "A" not less than lowercase "a"? Based on raw values
> it
> > should be. What am I missing here?
> >
> > Thanks
> > Kristjan
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list