[R] Somewhat disconcerting behavior of seq.int()

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Tue May 3 05:02:16 CEST 2022


Thank you Andrew.

My response is: not quite, but you essentially explained it. Just
replacing "by = 1" by "by = 1L" was not sufficient (I had actually
tried that). But what I really needed to do was add an explicit cast
of the sieve2 sequence to integer:

sieve2 <- function(m){
   if(m < 2) return(NULL)
   a <- floor(sqrt(m))
   pr <- Recall(a)
###################### explicit cast
   s <- as.integer(seq.int(2L, to = m, by =1)) ## Only difference here
#####################
   for( i in pr) s <- s[as.logical(s %% i)]
   c(pr,s)
}

> microbenchmark(l1 <- sieve1(1e5), times =50)
Unit: milliseconds
                expr      min       lq     mean   median       uq
 l1 <- sieve1(1e+05) 3.957168 4.001834 4.772071 4.018396 4.538917
      max neval
 8.135334    50
> microbenchmark(l2 <- sieve2(1e5), times =50)
Unit: milliseconds
                expr     min       lq     mean   median       uq   max
 l2 <- sieve2(1e+05) 3.98475 4.041709 4.805767 4.068167 4.446917 8.422
 neval
    50
> identical(l1,l2)
[1] TRUE
> identical(l1,l2)
[1] TRUE

So it is the %% generic that made the difference in timings in integer
vs. double. And the Help file does warn about this:

VALUE:
"seq.int and the default method of seq for numeric arguments return a
vector of type "integer" or "double": programmers should not rely on
which."

So I would have to say that it was my error (or failure to pay
attention, anyway).

Again, thanks for your help on this. It made the difference for me.

Bert

On Mon, May 2, 2022 at 7:00 PM Andrew Simmons <akwsimmo using gmail.com> wrote:
>
> A sequence where 'from' and 'to' are both integer valued (not necessarily class integer) will use R_compact_intrange; the return value is an integer vector and is stored with minimal space.
>
> In your case, you specified a 'from', 'to', and 'by'; if all are integer class, then the return value is also integer class. I think if 'from' and 'to' are integer valued and 'by' is integer class, the return value is integer class, might want to check that though. In your case, I think replacing 'by = 1' with 'by = 1L' will mean the sequences are identical, though it may still take longer than not specifying at all.
>
> On Mon, May 2, 2022, 21:46 Bert Gunter <bgunter.4567 using gmail.com> wrote:
>>
>> ** Disconcerting to me, anyway; perhaps not to others**
>> (Apologies if this has been discussed before. I was a bit nonplussed by
>> it, but maybe I'm just clueless.) Anyway:
>>
>> Here are two almost identical versions of the Sieve of Eratosthenes.
>> The difference between them is only in the call to seq.int() that is
>> highlighted
>>
>> sieve1 <- function(m){
>>    if(m < 2) return(NULL)
>>    a <- floor(sqrt(m))
>>    pr <- Recall(a)
>> ####################
>>    s <- seq.int(2, to = m) ## Only difference here
>> ######################
>>    for( i in pr) s <- s[as.logical(s %% i)]
>>    c(pr,s)
>> }
>>
>> sieve2 <- function(m){
>>    if(m < 2) return(NULL)
>>    a <- floor(sqrt(m))
>>    pr <- Recall(a)
>> ####################
>>    s <- seq.int(2, to = m, by =1) ## Only difference here
>> #######################
>>    for( i in pr) s <- s[as.logical(s %% i)]
>>    c(pr,s)
>> }
>>
>> However, execution time is *quite* different.
>>
>> library(microbenchmark)
>>
>> > microbenchmark(l1 <- sieve1(1e5), times =50)
>> Unit: milliseconds
>>                 expr      min       lq     mean  median       uq      max
>>  l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751
>>  neval
>>     50
>>
>> > microbenchmark(l2 <- sieve2(1e5), times =50)
>> Unit: milliseconds
>>                 expr      min      lq     mean   median       uq      max
>>  l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464
>>  neval
>>     50
>>
>> Now note that:
>> > identical(l1, l2)
>> [1] FALSE
>>
>> ## Because:
>> > str(l1)
>>  int [1:9592] 2 3 5 7 11 13 17 19 23 29 ...
>>
>> > str(l2)
>>  num [1:9592] 2 3 5 7 11 13 17 19 23 29 ...
>>
>> I therefore assume that seq.int(), an internal generic, is dispatching
>> to a method that uses integer arithmetic for sieve1 and floating point
>> for sieve2. Is this correct? If not, what do I fail to understand? And
>> is this indeed the source of the large difference in execution time?
>>
>> Further, ?seq.int says:
>> "The interpretation of the unnamed arguments of seq and seq.int is not
>> standard, and it is recommended always to name the arguments when
>> programming."
>>
>> The above suggests that maybe this advice should be qualified, and/or
>> adding some comments to the Help file regarding this behavior might be
>> useful to naïfs like me.
>>
>> In case it makes a difference (and it might!):
>>
>> > sessionInfo()
>> R version 4.2.0 (2022-04-22)
>> Platform: x86_64-apple-darwin17.0 (64-bit)
>> Running under: macOS Monterey 12.3.1
>>
>> Matrix products: default
>> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] microbenchmark_1.4.9
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_4.2.0 tools_4.2.0
>>
>>
>> Thanks for any enlightenment and again apologies if I am plowing old ground.
>>
>> Best to all,
>>
>> Bert Gunter
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list