[R] rowSums() and is.integer()

Mon Nov 12 09:51:47 CET 2007

On 10 Nov 2007, at 07:32, Prof Brian Ripley wrote:

> On Fri, 9 Nov 2007, Robin Hankin wrote:
>
>> Hi
>>
>> [R-2.6.0, macOSX 10.4.10].
>>
>> The helppage says that rowSums() and colSums()
>> are equivalent to 'apply' with  'FUN = sum'.
>>
>> But I came across this:
>>
>> > a <- matrix(1:30,5,6)
>> > is.integer(apply(a,1,sum))
>> [1] TRUE
>> > is.integer(rowSums(a))
>> [1] FALSE
>> >
>
> 'equivalent' does not mean 'identical': the wording was deliberate.
>
>> so rowSums() returns a float.
>
> And that is what the help page says it does (albeit more  
> accurately: there is no 'float' type, but there is numeric aka  
> double and the result could be complex).
>
>> Why is this?
>
> You seem to be asking why R works as documented!
>

Yes, that's exactly what I was asking [perhaps this should have been
R-devel?].  What is the thinking behind converting to double?

I expect that  part of the answer is speed:

# First define an  integer matrix:
a <- matrix(as.integer(rpois(1e6,3)),1000,1000)

 > system.time(rowSums(a))
    user  system elapsed
   0.049   0.000   0.050
 > system.time(rowSums(a))
    user  system elapsed
   0.050   0.000   0.051
 > system.time(rowSums(a))
    user  system elapsed
   0.050   0.001   0.052
 > system.time(colSums(a))
    user  system elapsed
   0.043   0.001   0.046
 > system.time(colSums(a))
    user  system elapsed
   0.043   0.000   0.044

About the same speed.  Now use apply() to see whether integer summation
is faster than double summation for this kind of problem:

 > system.time(ignore <- apply(a,1,sum))
    user  system elapsed
   0.085   0.009   0.094
 > system.time(ignore <- apply(a,1,sum))
    user  system elapsed
   0.086   0.010   0.095
 > system.time(ignore <- apply(a,1,sum))
    user  system elapsed
   0.089   0.010   0.104
 > system.time(ignore <- apply(a,2,sum))
    user  system elapsed
   0.071   0.008   0.078
 > system.time(ignore <- apply(a,2,sum))
    user  system elapsed
   0.069   0.007   0.076
 > system.time(ignore <- apply(a,2,sum))
    user  system elapsed
   0.070   0.008   0.081

# Now convert to double:

 > a <- a+0
 > system.time(ignore <- apply(a,1,sum))
    user  system elapsed
   0.127   0.019   0.151
 > system.time(ignore <- apply(a,1,sum))
    user  system elapsed
   0.121   0.017   0.139
 > system.time(ignore <- apply(a,1,sum))
    user  system elapsed
   0.130   0.022   0.175
 > system.time(ignore <- apply(a,2,sum))
    user  system elapsed
   0.084   0.015   0.098
 > system.time(ignore <- apply(a,2,sum))
    user  system elapsed
   0.085   0.015   0.105
 > system.time(ignore <- apply(a,2,sum))
    user  system elapsed
   0.087   0.016   0.107

[can anyone comment on the difference between the first three and the  
last three
double precision summations?]

perhaps a little bit faster for the integers, but there's
not much in it.  So, why does rowSums() coerce to double (behaviour
that is undesirable for me)?

--
Robin Hankin
Uncertainty Analyst
National Oceanography Centre, Southampton
European Way, Southampton SO14 3ZH, UK
  tel  023-8059-7743