[R] Inconsistent alphabetisation issue
Stefano Conti
s.conti at gmx.co.uk
Fri May 23 13:00:05 CEST 2014
Dear R users community,
For some time now I have occasionally observed some inconsistent behaviour
across identical (i.e. same 3.1.0 version and set-up / configuration) R
installations on separate Linux machines (all manufactured in the UK).
Specifically, after reading (via 'read.table' or its flavours) some
data-frames and then tabulate its factors, I notice that the levels of some
factors are by default alphabetised differently between different machines.
As an example, on 2 separate work I obtain from a given data-frame (say
'tbl') before applying any processing the same output
> tbl <- read.csv(path.expand("~/tmp/tbl.csv"), header=TRUE)
> levels(tbl$Ethnicity)
[1] "Black-African" "Black-Caribbean"
[3] "Black other" "Indian/Pakistani/Bangladeshi"
[5] "Not Known" "Other Asian/Oriental"
[7] "Other/Mixed" "White"
[9] "Black Other" "Not known"
whereas reproducing the same code and instructions on my personal laptop
yields the following:
> tbl <- read.csv(path.expand("~/tmp/tbl.csv"), header=TRUE)
> levels(tbl$Ethnicity)
[1] "Black other" "Black-African"
[3] "Black-Caribbean" "Indian/Pakistani/Bangladeshi"
[5] "Not known" "Other
Asian/Oriental"
[7] "Other/Mixed" "White"
[9] "Black Other" "Not known"
I've tried looking up on the R mailing list, as well as on the R
documentation and on Stack Overflow, what could the source of, and in
particular a solution to, this discrepant behaviour; unfortunately, apart
from some hint to localisation issues -- which I can't see how they'd apply
in my case -- couldn't find anything pertinent.
Many thanks in advance for any help / insight you may have to provide on
this!
--
Dr Stefano Conti
More information about the R-help
mailing list