[R] slowness when I use a list comprehension

Laurent Rhelp |@urentRHe|p @end|ng |rom |ree@|r
Sun Jun 16 20:40:03 CEST 2024


Avi and Jeff,

Thank you very much for your answers. I did not think I would get such 
an interessing answer when I asked my question.

In fact, I discovered recently the list comprehension reading some 
python code and I was seduced but the compact notation so I decided to 
do an exercice on an example.

Now I know why the use of the comprehenr use is slow (cf. avi answer) 
and I was impressed by the jeff’s function which uses the vertorization.

Unit: milliseconds
expr min lq mean median
S_diff2 <- dloop(N1, M2, ratio_sampling, vec1, vec2) 205.0905 212.86080 
226.80683 221.3820
S_diff3 <- vloop(N1, M2, ratio_sampling, vec1, vec2) 49.8971 57.05555 
64.25502 58.9455
uq max neval cld
227.57695 297.9974 20 a
63.15645 113.4106 20 b

I did not have the idea to transform the second loop with a vectorize 
approach.

Hence, the good direction is to think more in terms of vectorization.

I will search for some exercices on the web.


Le 16/06/2024 à 19:44, avi.e.gross using gmail.com a écrit :
> I fully agree with Jeff that the best way to use ANY language is to evaluate
> the language in terms of not just the capabilities it offers but also the
> philosophy behind what it was created for and how people do things and just
> grok it and use it mostly in the way intended. I do that with all the
> languages I learn, whether for computers or humans.
>
> Bringing in something you like from another language often gets in the way
> of actually using what you have. But realistically, many languages that were
> designed for one purpose will then evolve to suit many other purposes and
> lose their direction and often their focus and even efficiency. S was
> designed for statistical computing of some sorts and that meant a vectorized
> approach could take you far. Python had other design goals and the original
> designers wanted elements of genrality that a list provides more than a
> vector does. R has lists too, but note if you want to use the kind of
> dictionary or set used in python, which definitely can have advanatages and
> disadvantages, you can find add-ons in R packages that give you something
> like that too. And, note, many, myself included, really appreciate alternate
> ways to do things and heavily use tidyverse packages that mostly are not
> base R but sort of a grafted-on other language. So what? Purists don't
> necessarily do well in the real world.
>
> On the topic at hand and speed, I went an looked at the comprehenr package
> and it is no wonder it is slower.
>
> Here is the code Laurent used in calling to_vec:
>
>> to_vec
> function (expr, recursive = TRUE, use.names = FALSE)
> {
>      res = eval.parent(substitute(comprehenr::to_list(expr)))
>      unlist(res, recursive = recursive, use.names = use.names)
> }'
>
> It does a few things and then calls to_list() to do the actual work. This
> extra layer may slow it down a tad.
>
> So what does to_list() do?
>
>> to_list
> function (expr)
> {
>      expr = substitute(expr)
>      is_loop(expr) || stop(paste("argument should be expression with 'for',
> 'while' or 'repeat' but we have: ",
>          deparse(expr, width.cutoff = 500)[1]))
>      expr = expand_loop_variables(expr)
>      expr = add_assignment_to_final_loops(expr)
>      expr = substitute(local({
>          .___res <- list()
>          .___counter <- 0
>          expr
>          .___res
>      }))
>      eval.parent(expr)
> }
>
> I won't follow the entire chain, but it seems to take the code supplied and
> isolate various parts needed and, in effect, build up some other code and
> evaluates it in the context of the parent.
>
> Obviously, had you written similar (or different using loops or whatever)
> code directly, it might execute faster.
>
> As I mentioned, this is largely syntactic sugar. A reasonable use of this is
> if you are given python code and asked to translate it into R code that does
> the same thing. You could spend time thinking and designing and come up with
> the kind of R code an R expert might have done, or skip that and just make
> slight changes needed for R and for the package being used and it should
> work, but not necessarily the way a native polished version works. Later, if
> time and finances permit, and you want it faster, rewrite it.
>
> I note the package, with a vignetter here:
> https://cran.r-project.org/web//packages/comprehenr/vignettes/Introduction.h
> tml
>
> Does make some changes so translating is not trivial. For example, the
> python syntax such as:
>
> [ f(x) for x in iterable if condition]
>
> Is not able to be used in quite that order. It loosely translates to:
>
> to_vec(for x in iterable if condition f(x))
>
> with the result at the end rather than beginning. And, since R has not
> chosen to return multiple things from a function like python does and just
> unpack them, they had to come up with interesting workarounds like `x, y`
> and frankly, quite a few things I can do in python in this context are
> simply not supported by this code, nor can be expected to.
>
> I think if someone using python was used to using the extended version by
> loading modules like numpy and pandas and using them heavily, they might
> find it a tad easier to then port the code to R and use vectorized
> functionality better.
>
> So, are packages like comprehend a crutch or are they helpful or even evil?
> My view is to not be a religious fanatic and assume any language was really
> designed perfectly. Some ideas and implementations can be a useful way to
> formulate a problem for a programmer who thinks in that way, at least until
> they learn to also think in another. An example would be the R way to do
> sets is probably not as useful as the python way. If I needed heavy duty
> usage, I might load a package that lets me think about it the way I want,
> and the same for a dictionary.
>
> But, if I am writing code for others to maintain and change later, the
> closer I stick to the main language or accepted packages, the better.
>
>   
>
> -----Original Message-----
> From: R-help<r-help-bounces using r-project.org>  On Behalf Of Jeff Newmiller via
> R-help
> Sent: Sunday, June 16, 2024 1:13 PM
> To:r-help using r-project.org
> Subject: Re: [R] slowness when I use a list comprehension
>
> I would be more strong on this advice: learn to think in R, rather than
> thinking in Python, when programming in R. R has atomic vectors... Python
> does not (until you import a package that implements them). I find that
> while it is possible to import R thinking into Python, Python programmers
> seem to object for stylistic reasons even though such thinking speeds up
> Python also.
>
> A key step in that direction is to stop using lists directly for numeric
> calculations... use them to manage numeric vactors. In some cases you can
> switch to matrices or arrays to remove even more list manipulations from the
> script.
>
> library(microbenchmark)
>
> ratio_sampling <- 500
> ## size of the first serie
> N1 <- 70000
> ## size of the second serie
> N2 <- 100
> ## mock data
> set.seed(123)
> vec1 <- rnorm(N1)
> vec2 <- runif(N2)
>
> dloop <- function( N1, M2, ratio_sampling, vec1, vec2 ) {
>    S_diff2 <- numeric(
>      N1-(N2-1)*ratio_sampling
>    )
>    for( j in 1:length(S_diff2) ) {
>      sum_squares <- 0
>      for( i in 1:length(vec2)){
>        sum_squares <- (
>          sum_squares
>          + (
>            vec1[ (i-1)*ratio_sampling+j ]
>            - vec2[i]
>          )**2
>        )
>      }
>      S_diff2[j] <- sum_squares
>    }
>    S_diff2
> }
>
> vloop <- function( N1, M2, ratio_sampling, vec1, vec2 ) {
>    S_diff3 <- numeric(
>      N1-(N2-1)*ratio_sampling
>    )
>    i <- seq_along( vec2 )
>    k <- (i-1)*ratio_sampling
>    for( j in seq_along( S_diff3 ) ) {
>      S_diff3[j] <- sum(
>        (
>          vec1[ j + k ]
>          - vec2
>        )^2
>      )
>    }
>    S_diff3
> }
>
> microbenchmark(
>    S_diff2 <- dloop( N1, M2, ratio_sampling, vec1, vec2 )
>    , S_diff3 <- vloop( N1, M2, ratio_sampling, vec1, vec2 )
>    , times = 20
> )
>
> all.equal( S_diff2, S_diff3 )
>
>
> On June 16, 2024 9:33:54 AM PDT,avi.e.gross using gmail.com  wrote:
>> Laurent,
>>
>> Thank you for introducing me to a package I did not know existed as I use
> features like list comprehension in python all the time and could see using
> it in R now that I know it is available.
>> As to why you see your example as slow, I see you used a fairly complex and
> nested expression and wonder if it was a better way to go. As you are
> dealing with an interpreter doing delayed evaluation, I can imagine reasons
> it can be slow. But note the package comprehenr may not be designed to be
> more efficient than loops or of the more built-in functional methods that
> can be faster. The package is there perhaps more as a compatibility helper
> that allows you to write closer to the python style and perhaps re-shapes
> what you wrote into a set of instructions in more native R.
>> Just for comparison, in python, things like comprehensions for list or
> dictionaries or tuples often are syntactic sugar and the interpreter may
> simply rewrite them more like the first program you typed and evaluates
> that. The comprehensions are more designed for users who can think another
> way and write things more compactly as one-liners. Depending on
> implementations, they may be faster or slower on some examples.
>> I am not saying there is nothing else that is slowing it down for you. I am
> suggesting that using the feature as currently implemented may not be an
> advantage except in your thought process. It may be it could be improved,
> such as by replacing more functionality out of R and into compiled languages
> as has been done for many packages.
>> Avi
>>
>> -----Original Message-----
>> From: R-help<r-help-bounces using r-project.org>  On Behalf Of Laurent Rhelp
>> Sent: Sunday, June 16, 2024 11:28 AM
>> To:r-help using r-project.org
>> Subject: [R] slowness when I use a list comprehension
>>
>> Dear RHelp-list,
>>
>>     I try to use the package comprehenr to replace a for loop by a list
>> comprehension.
>>
>>   I wrote the code but I certainly miss something because it is very
>> slower compared to the for loops. May you please explain to me why the
>> list comprehension is slower in my case.
>>
>> Here is my example. I do the calculation of the square difference
>> between the values of two vectors vec1 and vec2, the ratio sampling
>> between vec1 and vec2 is equal to ratio_sampling. I have to use only the
>> 500th value of the first serie before doing the difference with the
>> value of the second serie (vec2).
>>
>> Thank you
>>
>> Best regards
>>
>> Laurent
>>
>> library(tictoc)
>> library(comprehenr)
>>
>> ratio_sampling <- 500
>> ## size of the first serie
>> N1 <- 70000
>> ## size of the second serie
>> N2 <- 100
>> ## mock data
>> set.seed(123)
>> vec1 <- rnorm(N1)
>> vec2 <- runif(N2)
>>
>>
>> ## 1. with the "for" loops
>>
>> ## the square differences will be stored in a vector
>> S_diff2 <- numeric((N1-(N2-1)*ratio_sampling))
>> tic()
>> for( j in 1:length(S_diff2)){
>>    sum_squares <- 0
>>    for( i in 1:length(vec2)){
>>      sum_squares = sum_squares + ((vec1[(i-1)*ratio_sampling+j] -
>> vec2[i])**2)
>>    }
>>    S_diff2[j] <- sum_squares
>> }
>> toc()
>> ## 0.22 sec elapsed
>> which.max(S_diff2)
>> ## 7857
>>
>> ## 2. with the lists comprehension
>> tic()
>> S_diff2 <- to_vec(for( j in 1:length(S_diff2)) sum(to_vec(for( i in
>> 1:length(vec2)) ((vec1[(i-1)*ratio_sampling+j] - vec2[i])**2))))
>> toc()
>> ## 25.09 sec elapsed
>> which.max(S_diff2)
>> ## 7857
>>
>> ______________________________________________
>> R-help using r-project.org  mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help using r-project.org  mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
	[[alternative HTML version deleted]]



More information about the R-help mailing list