[R] Bug in gmodels CrossTable()?

Marc Schwartz marc_schwartz at me.com
Sun May 31 15:41:05 CEST 2009


On May 31, 2009, at 7:51 AM, Jakson Alves de Aquino wrote:

> Is the code below showing a bug in Crosstable()? My expectation was  
> that
> the values produced by xtabs were rounded instead of truncated:
>
> library(gmodels)
> abc <- c("a", "a", "b", "b", "c", "c")
> def <- c("d", "e", "f", "f", "d", "e")
> wgt <- c(0.8, 0.6, 0.4, 0.5, 1.4, 1.3)
>
> xtabs(wgt ~ abc + def)
>
> CrossTable(xtabs(wgt ~ abc + def), prop.r = F, prop.c = F,
>  prop.t = F, prop.chisq = F)


CrossTable() is designed to take one or two vectors, which are then  
[cross-]tabulated to yield integer counts, OR a matrix of integer  
counts, not fractional values. In the latter case, it is presumed that  
the matrix is the result of an 'a priori' cross-tabulation operation  
such as the use of table().

The output of xtabs() above is:

 > xtabs(wgt ~ abc + def)
    def
abc   d   e   f
   a 0.8 0.6 0.0
   b 0.0 0.0 0.9
   c 1.4 1.3 0.0



The relevant output of CrossTable() in your example above shows:


              | def
          abc |         d |         e |         f | Row Total |
-------------|-----------|-----------|-----------|-----------|
            a |         0 |         0 |         0 |         1 |
-------------|-----------|-----------|-----------|-----------|
            b |         0 |         0 |         0 |         0 |
-------------|-----------|-----------|-----------|-----------|
            c |         1 |         1 |         0 |         2 |
-------------|-----------|-----------|-----------|-----------|
Column Total |         2 |         1 |         0 |         5 |
-------------|-----------|-----------|-----------|-----------|



The internal table object that would be generated here is effectively:

 > addmargins(xtabs(wgt ~ abc + def))
      def
abc     d   e   f Sum
   a   0.8 0.6 0.0 1.4
   b   0.0 0.0 0.9 0.9
   c   1.4 1.3 0.0 2.7
   Sum 2.2 1.9 0.9 5.0



The textual output of CrossTable() is internally formatted using  
formatC(..., format = "d"), which is an integer based format:

 > formatC(addmargins(xtabs(wgt ~ abc + def)), format = "d")
      def
abc   d e f Sum
   a   0 0 0 1
   b   0 0 0 0
   c   1 1 0 2
   Sum 2 1 0 5



In other words, you are getting the integer coerced values of the  
individual cells and then the same for the column, row and table totals:

 > matrix(as.integer(addmargins(xtabs(wgt ~ abc + def))), 4, 4)
      [,1] [,2] [,3] [,4]
[1,]    0    0    0    1
[2,]    0    0    0    0
[3,]    1    1    0    2
[4,]    2    1    0    5



If you review ?as.integer, you will note the following in the 'Value'  
section:

   Non-integral numeric values are truncated towards zero (i.e.,  
as.integer(x) equals trunc(x) there)



The output is correct, if confusing, but you are really using the  
function in a fashion that is not intended.

HTH,

Marc Schwartz




More information about the R-help mailing list