[R] Somewhat disconcerting behavior of seq.int()
Bert Gunter
bgunter@4567 @end|ng |rom gm@||@com
Tue May 3 05:02:16 CEST 2022
Thank you Andrew.
My response is: not quite, but you essentially explained it. Just
replacing "by = 1" by "by = 1L" was not sufficient (I had actually
tried that). But what I really needed to do was add an explicit cast
of the sieve2 sequence to integer:
sieve2 <- function(m){
if(m < 2) return(NULL)
a <- floor(sqrt(m))
pr <- Recall(a)
###################### explicit cast
s <- as.integer(seq.int(2L, to = m, by =1)) ## Only difference here
#####################
for( i in pr) s <- s[as.logical(s %% i)]
c(pr,s)
}
> microbenchmark(l1 <- sieve1(1e5), times =50)
Unit: milliseconds
expr min lq mean median uq
l1 <- sieve1(1e+05) 3.957168 4.001834 4.772071 4.018396 4.538917
max neval
8.135334 50
> microbenchmark(l2 <- sieve2(1e5), times =50)
Unit: milliseconds
expr min lq mean median uq max
l2 <- sieve2(1e+05) 3.98475 4.041709 4.805767 4.068167 4.446917 8.422
neval
50
> identical(l1,l2)
[1] TRUE
> identical(l1,l2)
[1] TRUE
So it is the %% generic that made the difference in timings in integer
vs. double. And the Help file does warn about this:
VALUE:
"seq.int and the default method of seq for numeric arguments return a
vector of type "integer" or "double": programmers should not rely on
which."
So I would have to say that it was my error (or failure to pay
attention, anyway).
Again, thanks for your help on this. It made the difference for me.
Bert
On Mon, May 2, 2022 at 7:00 PM Andrew Simmons <akwsimmo using gmail.com> wrote:
>
> A sequence where 'from' and 'to' are both integer valued (not necessarily class integer) will use R_compact_intrange; the return value is an integer vector and is stored with minimal space.
>
> In your case, you specified a 'from', 'to', and 'by'; if all are integer class, then the return value is also integer class. I think if 'from' and 'to' are integer valued and 'by' is integer class, the return value is integer class, might want to check that though. In your case, I think replacing 'by = 1' with 'by = 1L' will mean the sequences are identical, though it may still take longer than not specifying at all.
>
> On Mon, May 2, 2022, 21:46 Bert Gunter <bgunter.4567 using gmail.com> wrote:
>>
>> ** Disconcerting to me, anyway; perhaps not to others**
>> (Apologies if this has been discussed before. I was a bit nonplussed by
>> it, but maybe I'm just clueless.) Anyway:
>>
>> Here are two almost identical versions of the Sieve of Eratosthenes.
>> The difference between them is only in the call to seq.int() that is
>> highlighted
>>
>> sieve1 <- function(m){
>> if(m < 2) return(NULL)
>> a <- floor(sqrt(m))
>> pr <- Recall(a)
>> ####################
>> s <- seq.int(2, to = m) ## Only difference here
>> ######################
>> for( i in pr) s <- s[as.logical(s %% i)]
>> c(pr,s)
>> }
>>
>> sieve2 <- function(m){
>> if(m < 2) return(NULL)
>> a <- floor(sqrt(m))
>> pr <- Recall(a)
>> ####################
>> s <- seq.int(2, to = m, by =1) ## Only difference here
>> #######################
>> for( i in pr) s <- s[as.logical(s %% i)]
>> c(pr,s)
>> }
>>
>> However, execution time is *quite* different.
>>
>> library(microbenchmark)
>>
>> > microbenchmark(l1 <- sieve1(1e5), times =50)
>> Unit: milliseconds
>> expr min lq mean median uq max
>> l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751
>> neval
>> 50
>>
>> > microbenchmark(l2 <- sieve2(1e5), times =50)
>> Unit: milliseconds
>> expr min lq mean median uq max
>> l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464
>> neval
>> 50
>>
>> Now note that:
>> > identical(l1, l2)
>> [1] FALSE
>>
>> ## Because:
>> > str(l1)
>> int [1:9592] 2 3 5 7 11 13 17 19 23 29 ...
>>
>> > str(l2)
>> num [1:9592] 2 3 5 7 11 13 17 19 23 29 ...
>>
>> I therefore assume that seq.int(), an internal generic, is dispatching
>> to a method that uses integer arithmetic for sieve1 and floating point
>> for sieve2. Is this correct? If not, what do I fail to understand? And
>> is this indeed the source of the large difference in execution time?
>>
>> Further, ?seq.int says:
>> "The interpretation of the unnamed arguments of seq and seq.int is not
>> standard, and it is recommended always to name the arguments when
>> programming."
>>
>> The above suggests that maybe this advice should be qualified, and/or
>> adding some comments to the Help file regarding this behavior might be
>> useful to naïfs like me.
>>
>> In case it makes a difference (and it might!):
>>
>> > sessionInfo()
>> R version 4.2.0 (2022-04-22)
>> Platform: x86_64-apple-darwin17.0 (64-bit)
>> Running under: macOS Monterey 12.3.1
>>
>> Matrix products: default
>> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] microbenchmark_1.4.9
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_4.2.0 tools_4.2.0
>>
>>
>> Thanks for any enlightenment and again apologies if I am plowing old ground.
>>
>> Best to all,
>>
>> Bert Gunter
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list