[R] Possible Improvement to sapply
Doran, Harold
HDoran at air.org
Wed Mar 14 10:14:57 CET 2018
Well thanks, Martin, and glad to see there is some potential here. This
wasn¹t reported as a bug, but as you note really as a question originally
and with an invitation to critique my code.
On 3/14/18, 5:11 AM, "Martin Maechler" <maechler at stat.math.ethz.ch> wrote:
>>>>>> Henrik Bengtsson <henrik.bengtsson at gmail.com>
>>>>>> on Tue, 13 Mar 2018 10:12:55 -0700 writes:
>
>> FYI, in R devel (to become 3.5.0), there's isFALSE() which will cut
>> some corners compared to identical():
>
>> > microbenchmark::microbenchmark(identical(FALSE, FALSE),
>>isFALSE(FALSE))
>> Unit: nanoseconds
>> expr min lq mean median uq max neval
>> identical(FALSE, FALSE) 984 1138 1694.13 1218.0 1337.5 13584 100
>> isFALSE(FALSE) 713 761 1133.53 809.5 871.5 18619 100
>
>> > microbenchmark::microbenchmark(identical(TRUE, FALSE), isFALSE(TRUE))
>> Unit: nanoseconds
>> expr min lq mean median uq max neval
>> identical(TRUE, FALSE) 1009 1103.5 2228.20 1170.5 1357 14346 100
>> isFALSE(TRUE) 718 760.0 1298.98 798.0 898 17782 100
>
>> > microbenchmark::microbenchmark(identical("array", FALSE),
>>isFALSE("array"))
>> Unit: nanoseconds
>> expr min lq mean median uq max neval
>> identical("array", FALSE) 975 1058.5 1257.95 1119.5 1250.0 9299 100
>> isFALSE("array") 409 433.5 658.76 446.0 476.5 9383 100
>
>Thank you Henrik!
>
>The speed of the new isTRUE() and isFALSE() is indeed amazing
>compared to identical() which was written to be fast itself.
>
>Note that the new code goes back to a proposal by Hervé Pagès
>(of Bioconductor fame) in a thread with R core in April 2017.
>The goal of the new code actually *was* to allow call like
>
> isTRUE(c(a = TRUE))
>
>to become TRUE rather than improving speed.
>The new source code is at the end of R/src/library/base/R/identical.R
>
>## NB: is.logical(.) will never dispatch:
>## -- base::is.logical(x) <==> typeof(x) == "logical"
>isTRUE <- function(x) is.logical(x) && length(x) == 1L && !is.na(x) && x
>isFALSE <- function(x) is.logical(x) && length(x) == 1L && !is.na(x) && !x
>
>and one *reason* this is so fast is that all 6 functions which
>are called are primitives :
>
>> sapply(codetools::findGlobals(isTRUE), function(fn)
>>is.primitive(get(fn)))
> ! && == is.logical is.na length
> TRUE TRUE TRUE TRUE TRUE TRUE
>
>and a 2nd reason is probably with the many recent improvements of the
>byte compiler.
>
>
>> That could probably be used also is sapply(). The difference is that
>> isFALSE() is a bit more liberal than identical(x, FALSE), e.g.
>
>> > isFALSE(c(a = FALSE))
>> [1] TRUE
>> > identical(c(a = FALSE), FALSE)
>> [1] FALSE
>
>> Assuming the latter is not an issue, there are 69 places in base R
>> where isFALSE() could be used:
>
>> $ grep -E "identical[(][^,]+,[ ]*FALSE[)]" -r --include="*.R" | grep -F
>>"/R/" | wc
>> 69 326 5472
>
>> and another 59 where isTRUE() can be used:
>
>> $ grep -E "identical[(][^,]+,[ ]*TRUE[)]" -r --include="*.R" | grep -F
>>"/R/" | wc
>> 59 307 5021
>
>Beautiful use of 'grep' -- thank you for those above, as well.
>It does need a quick manual check, but if I use the above grep
>from Emacs (via 'M-x grep') or even better via a TAGS table
>and M-x tags-query-replace I should be able to do the changes
>pretty quickly... and will start looking into that later today.
>
>Interestingly and to my great pleasure, the first part of the
>'Subject' of this mailing list thread, "Possible Improvement",
>*has* become true after all --
>
>-- thanks to Henrik !
>
>Martin Maechler
>ETH Zurich
>
>
>
>> On Tue, Mar 13, 2018 at 9:21 AM, Doran, Harold <HDoran at air.org> wrote:
>> > Quite possibly, and I¹ll look into that. Aside from the work I was
>>doing, however, I wonder if there is a way such that sapply could avoid
>>the overhead of having to call the identical function to determine the
>>conditional path.
>> >
>> >
>> >
>> > From: William Dunlap [mailto:wdunlap at tibco.com]
>> > Sent: Tuesday, March 13, 2018 12:14 PM
>> > To: Doran, Harold <HDoran at air.org>
>> > Cc: Martin Morgan <martin.morgan at roswellpark.org>;
>>r-help at r-project.org
>> > Subject: Re: [R] Possible Improvement to sapply
>> >
>> > Could your code use vapply instead of sapply? vapply forces you to
>>declare the type and dimensions
>> > of FUN's output and stops if any call to FUN does not match the
>>declaration. It can use much less
>> > memory and time than sapply because it fills in the output array as
>>it goes instead of calling lapply()
>> > and seeing how it could be simplified.
>> >
>> > Bill Dunlap
>> > TIBCO Software
>> > wdunlap tibco.com<http://tibco.com>
>> >
>> > On Tue, Mar 13, 2018 at 7:06 AM, Doran, Harold
>><HDoran at air.org<mailto:HDoran at air.org>> wrote:
>> > Martin
>> >
>> > In terms of context of the actual problem, sapply is called millions
>>of times because the work involves scoring individual students who took
>>a test. A score for student A is generated and then student B and such
>>and there are millions of students. The psychometric process of scoring
>>students is complex and our code makes use of sapply many times for each
>>student.
>> >
>> > The toy example used length just to illustrate, our actual code
>>doesn't do that. But your point is well taken, there may be a very good
>>counterexample why my proposal doesn't achieve the goal is a
>>generalizable way.
>> >
>
>
>[.................]
>
More information about the R-help
mailing list