[R] rowSums() and is.integer()
Robin Hankin
r.hankin at noc.soton.ac.uk
Mon Nov 12 09:51:47 CET 2007
On 10 Nov 2007, at 07:32, Prof Brian Ripley wrote:
> On Fri, 9 Nov 2007, Robin Hankin wrote:
>
>> Hi
>>
>> [R-2.6.0, macOSX 10.4.10].
>>
>> The helppage says that rowSums() and colSums()
>> are equivalent to 'apply' with 'FUN = sum'.
>>
>> But I came across this:
>>
>> > a <- matrix(1:30,5,6)
>> > is.integer(apply(a,1,sum))
>> [1] TRUE
>> > is.integer(rowSums(a))
>> [1] FALSE
>> >
>
> 'equivalent' does not mean 'identical': the wording was deliberate.
>
>> so rowSums() returns a float.
>
> And that is what the help page says it does (albeit more
> accurately: there is no 'float' type, but there is numeric aka
> double and the result could be complex).
>
>> Why is this?
>
> You seem to be asking why R works as documented!
>
Yes, that's exactly what I was asking [perhaps this should have been
R-devel?]. What is the thinking behind converting to double?
I expect that part of the answer is speed:
# First define an integer matrix:
a <- matrix(as.integer(rpois(1e6,3)),1000,1000)
> system.time(rowSums(a))
user system elapsed
0.049 0.000 0.050
> system.time(rowSums(a))
user system elapsed
0.050 0.000 0.051
> system.time(rowSums(a))
user system elapsed
0.050 0.001 0.052
> system.time(colSums(a))
user system elapsed
0.043 0.001 0.046
> system.time(colSums(a))
user system elapsed
0.043 0.000 0.044
About the same speed. Now use apply() to see whether integer summation
is faster than double summation for this kind of problem:
> system.time(ignore <- apply(a,1,sum))
user system elapsed
0.085 0.009 0.094
> system.time(ignore <- apply(a,1,sum))
user system elapsed
0.086 0.010 0.095
> system.time(ignore <- apply(a,1,sum))
user system elapsed
0.089 0.010 0.104
> system.time(ignore <- apply(a,2,sum))
user system elapsed
0.071 0.008 0.078
> system.time(ignore <- apply(a,2,sum))
user system elapsed
0.069 0.007 0.076
> system.time(ignore <- apply(a,2,sum))
user system elapsed
0.070 0.008 0.081
# Now convert to double:
> a <- a+0
> system.time(ignore <- apply(a,1,sum))
user system elapsed
0.127 0.019 0.151
> system.time(ignore <- apply(a,1,sum))
user system elapsed
0.121 0.017 0.139
> system.time(ignore <- apply(a,1,sum))
user system elapsed
0.130 0.022 0.175
> system.time(ignore <- apply(a,2,sum))
user system elapsed
0.084 0.015 0.098
> system.time(ignore <- apply(a,2,sum))
user system elapsed
0.085 0.015 0.105
> system.time(ignore <- apply(a,2,sum))
user system elapsed
0.087 0.016 0.107
[can anyone comment on the difference between the first three and the
last three
double precision summations?]
perhaps a little bit faster for the integers, but there's
not much in it. So, why does rowSums() coerce to double (behaviour
that is undesirable for me)?
--
Robin Hankin
Uncertainty Analyst
National Oceanography Centre, Southampton
European Way, Southampton SO14 3ZH, UK
tel 023-8059-7743
More information about the R-help
mailing list