[R] integer codes of factors

Mike R mike.rstat at gmail.com
Thu Jul 14 23:04:55 CEST 2005


  U = c("b", "b", "b", "c", "d", "e", "e")

  F1 = factor( U, levels=c("a", "b", "c", "d", "e") )

  as.numeric(F1) 
  [1] 2 2 2 3 4 5 5 

Here, the integer code of "b" in F1 is 2

  K = factor( levels(F1) )
  as.numeric(K)
  [1] 1 2 3 4 5
  K
  [1] a b c d e
  Levels: a b c d e

And again, the integer code of "b" in K is 2. Great!

I am wondering how modify that usage such that the correspondence between 
the two numeric vectors can this be trusted.  for example, the correspondence 
can be corrupted by placing the "a" at the end:

  F2 = factor( U, levels=c("b", "c", "d", "e", "a") )
 
  as.numeric(F2) 
  [1] 1 1 1 2 3 4 4

Placing the "a" at the end changed the integer code of "b" in F2 to 1, which is 
not a problem. But ......

  K = factor( levels(F2) )
  as.numeric( K )
  [1] 2 3 4 5 1
  K
  [1] b c d e a
  Levels: a b c d e

But the integer code of "b" in K is now 2, which does not correspond to its code
in F2.

One would think that ordered=TRUE ought to avoid the corruption, but it does not
seem to accomplish that:

  K = factor(  levels(F2), ordered=TRUE ) 
  as.numeric(K)
  [1] 2 3 4 5 1
  K
  [1] b c d e a
  Levels: a < b < c < d < e

But the integer code of "b" in K is still 2.

However, corruption can be avoided with this idiom:

  K = factor(  levels(F2), levels=levels(F2) )
  as.numeric(K)
  [1] 1 2 3 4 5
  K
  [1] "b" "c" "d" "e" "a"
  Levels: b c d e a

Now the integer code of "b" in K is 1, which, as desired, is in
correspondence with
its code in F2.




More information about the R-help mailing list