[R] Peformance question
Thomas Mailund
thomas.mailund at gmail.com
Fri Apr 21 13:23:36 CEST 2017
I did try to profile it but I'll throw some more experiments at it. Right now I suspect it is mostly a problem of wrapping the data in objects which I do more for the purely functional version than the other two, but I'll experiment some more
Cheers
Thomas
On 21 Apr 2017, 13.20 +0200, Paul Johnson <pauljohn32 at gmail.com>, wrote:
> I dont understand your code. But I do have suggestion. Run the functions in the profiler, maybe differences will point at the enemy.
>
> Know what I mean?
>
> Rprof('check.out')
> #run code
> Rprof(NULL)
> summaryRprof('check.out')
>
> Do that for each method. That may be uninformative.
>
> I wondered if you tried to compile your functions? In some cases it helps erase differences like this. Norman Matloff has examples like that in Art of R Programming.
>
> I keep a list of things that are slow, if we can put finger on problem, I will add to list. I suspect slow here is in runtime object lookup. The environment ones have info located more quickly by the runtime, I expect. Also, passing info back and forth from the R runtime system using [ is a common cause of slow. It is why everybody yells 'vectorize' and 'use lapply' all the time. Then again, I'm guessing because I dont understand your code:)
>
> Good luck,
> PJ
>
>
>
>
> On Apr 11, 2017 7:44 PM, "Thomas Mailund" <thomas.mailund at gmail.com (mailto:thomas.mailund at gmail.com)> wrote:
> > Hi y’all,
> >
> > I’m working on a book on how to implement functional data structures in R, and in particular on a chapter on implementing queues. You get get the current version here https://www.dropbox.com/s/9c2yk3a67p1ypmr/book.pdf?dl=0 and the relevant pages are 50-59. I’ve implemented three versions of the same idea, implementing a queue using two linked lists. One list contains the elements you add to the end of a list, the other contains the elements at the front of the list, and when you try to get an element from a list and the front-list is empty you move elements from the back-list to the front. The asymptotic analysis is explained in this figure https://www.dropbox.com/s/tzi84zmyq16hdx0/queue-amortized-linear-bound.png?dl=0 and all my implementations do get a linear time complexity when I evaluate them on a linear number of operations. However, the two implementations that uses environments seem to be almost twice as fast as the implementation that gives me a persistent data structure (see https://www.dropbox.com/s/i9dyab9ordkm0xj/queue-comparisons.png?dl=0), and I cannot figure out why.
> >
> > The code below contains the implementation of all three versions of the queue plus the code I use to measure their performances. I’m sorry it is a little long, but it is a minimal implementation of all three variants, the comments just make it look longer than it really is.
> >
> > Since the three implementations are doing basically the same things, I am a little stumped about why the performance is so consistently different.
> >
> > Can anyone shed some light on this, or help me figure out how to explore this further?
> >
> > Cheers
> >
> > Thomas
> >
> >
> >
> > ## Implementations of queues ##################
> >
> > #' Test if a data structure is empty
> > #' @param x The data structure
> > #' @return TRUE if x is empty.
> > #' @export
> > is_empty <- function(x) UseMethod("is_empty")
> >
> > #' Add an element to a queue
> > #' @param x A queue
> > #' @param elm An element
> > #' @return an updated queue where the element has been added
> > #' @export
> > enqueue <- function(x, elm) UseMethod("enqueue")
> >
> > #' Get the front element of a queue
> > #' @param x A queue
> > #' @return the front element of the queue
> > #' @export
> > front <- function(x) UseMethod("front")
> >
> > #' Remove the front element of a queue
> > #' @param x The queue
> > #' @return The updated queue
> > #' @export
> > dequeue <- function(x) UseMethod("dequeue")
> >
> > ## Linked lists #########################
> >
> > #' Add a head item to a linked list.
> > #' @param elem The item to put at the head of the list.
> > #' @param lst The list -- it will become the tail of the new list.
> > #' @return a new linked list.
> > #' @export
> > list_cons <- function(elem, lst)
> > structure(list(head = elem, tail = lst), class = "linked_list")
> >
> > list_nil <- list_cons(NA, NULL)
> >
> > #' @method is_empty linked_list
> > #' @export
> > is_empty.linked_list <- function(x) identical(x, list_nil)
> >
> > #' Create an empty linked list.
> > #' @return an empty linked list.
> > #' @export
> > empty_list <- function() list_nil
> >
> >
> > #' Get the item at the head of a linked list.
> > #' @param lst The list
> > #' @return The element at the head of the list.
> > #' @export
> > list_head <- function(lst) lst$head
> >
> > #' Get the tail of a linked list.
> > #' @param lst The list
> > #' @return The tail of the list
> > #' @export
> > list_tail <- function(lst) lst$tail
> >
> > #' Reverse a list
> > #' @param lst A list
> > #' @return the reverse of lst
> > #' @export
> > list_reverse <- function(lst) {
> > acc <- empty_list()
> > while (!is_empty(lst)) {
> > acc <- list_cons(list_head(lst), acc)
> > lst <- list_tail(lst)
> > }
> > acc
> > }
> >
> >
> > ## Environment queues #################################################
> >
> > queue_environment <- function(front, back) {
> > e <- new.env(parent = emptyenv())
> > e$front <- front
> > e$back <- back
> > class(e) <- c("env_queue", "environment")
> > e
> > }
> >
> > #' Construct an empty closure based queue
> > #' @return an empty queue
> > #' @export
> > empty_env_queue <- function()
> > queue_environment(empty_list(), empty_list())
> >
> > #' @method is_empty env_queue
> > #' @export
> > is_empty.env_queue <- function(x)
> > is_empty(x$front) && is_empty(x$back)
> >
> > #' @method enqueue env_queue
> > #' @export
> > enqueue.env_queue <- function(x, elm) {
> > x$back <- list_cons(elm, x$back)
> > x
> > }
> >
> > #' @method front env_queue
> > #' @export
> > front.env_queue <- function(x) {
> > if (is_empty(x$front)) {
> > x$front <- list_reverse(x$back)
> > x$back <- empty_list()
> > }
> > list_head(x$front)
> > }
> >
> > #' @method dequeue env_queue
> > #' @export
> > dequeue.env_queue <- function(x) {
> > if (is_empty(x$front)) {
> > x$front <- list_reverse(x$back)
> > x$back <- empty_list()
> > }
> > x$front <- list_tail(x$front)
> > x
> > }
> >
> >
> >
> > ## Closure queues #####################################################
> >
> > queue <- function(front, back)
> > list(front = front, back = back)
> >
> > queue_closure <- function() {
> > q <- queue(empty_list(), empty_list())
> >
> > get_queue <- function() q
> >
> > queue_is_empty <- function() is_empty(q$front) && is_empty(q$back)
> >
> > enqueue <- function(elm) {
> > q <<- queue(q$front, list_cons(elm, q$back))
> > }
> >
> > front <- function() {
> > if (queue_is_empty()) stop("Taking the front of an empty list")
> > if (is_empty(q$front)) {
> > q <<- queue(list_reverse(q$back), empty_list())
> > }
> > list_head(q$front)
> > }
> >
> > dequeue <- function() {
> > if (queue_is_empty()) stop("Taking the front of an empty list")
> > if (is_empty(q$front)) {
> > q <<- queue(list_tail(list_reverse(q$back)), empty_list())
> > } else {
> > q <<- queue(list_tail(q$front), q$back)
> > }
> > }
> >
> > structure(list(is_empty = queue_is_empty,
> > get_queue = get_queue,
> > enqueue = enqueue,
> > front = front,
> > dequeue = dequeue),
> > class = "closure_queue")
> > }
> >
> > #' Construct an empty closure based queue
> > #' @return an empty queue
> > #' @export
> > empty_closure_queue <- function() queue_closure()
> >
> > #' @method is_empty closure_queue
> > #' @export
> > is_empty.closure_queue <- function(x) x$is_empty()
> >
> > #' @method enqueue closure_queue
> > #' @export
> > enqueue.closure_queue <- function(x, elm) {
> > x$enqueue(elm)
> > x
> > }
> >
> > #' @method front closure_queue
> > #' @export
> > front.closure_queue <- function(x) x$front()
> >
> > #' @method dequeue closure_queue
> > #' @export
> > dequeue.closure_queue <- function(x) {
> > x$dequeue()
> > x
> > }
> >
> > ## Extended (purely functional) queues ################################
> > queue_extended <- function(x, front, back)
> > structure(list(x = x, front = front, back = back),
> > class = "extended_queue")
> >
> >
> > #' Construct an empty extended queue
> > #'
> > #' This is just a queue that doesn't use a closure to be able to update
> > #' the data structure when front is called.
> > #'
> > #' @return an empty queue
> > #' @export
> > empty_extended_queue <- function() queue_extended(NA, empty_list(), empty_list())
> >
> > #' @method is_empty extended_queue
> > #' @export
> > is_empty.extended_queue <- function(x)
> > is_empty(x$front) && is_empty(x$back)
> >
> > #' @method enqueue extended_queue
> > #' @export
> > enqueue.extended_queue <- function(x, elm)
> > queue_extended(ifelse(is_empty(x$back), elm, x$x),
> > x$front, list_cons(elm, x$back))
> >
> > #' @method front extended_queue
> > #' @export
> > front.extended_queue <- function(x) {
> > if (is_empty(x)) stop("Taking the front of an empty list")
> > if (is_empty(x$front)) x$x
> > else list_head(x$front)
> > }
> >
> > #' @method dequeue extended_queue
> > #' @export
> > dequeue.extended_queue <- function(x) {
> > if (is_empty(x)) stop("Taking the front of an empty list")
> > if (is_empty(x$front))
> > x <- queue_extended(NA, list_reverse(x$back), empty_list())
> > queue_extended(x$x, list_tail(x$front), x$back)
> > }
> >
> > ## Performance experiments ######################
> >
> > library(microbenchmark)
> > library(tibble)
> > library(ggplot2)
> >
> > get_performance_n <- function(
> > algo
> > , n
> > , setup
> > , evaluate
> > , times
> > , ...) {
> >
> > config <- setup(n)
> > benchmarks <- microbenchmark(evaluate(n, config), times = times)
> > tibble(algo = algo, n = n, time = benchmarks$time / 1e9) # time in sec
> > }
> >
> > get_performance <- function(
> > algo
> > , ns
> > , setup
> > , evaluate
> > , times = 10
> > , ...) {
> > f <- function(n)
> > get_performance_n(algo, n, setup, evaluate, times = times, ...)
> > results <- Map(f, ns)
> > do.call('rbind', results)
> > }
> >
> >
> > setup <- function(n) n
> > evaluate <- function(empty) function(n, x) {
> > elements <- 1:n
> > queue <- empty
> > for (elm in elements) {
> > queue <- enqueue(queue, elm)
> > }
> > for (i in seq_along(elements)) {
> > queue <- dequeue(queue)
> > }
> > }
> >
> > ns <- seq(5000, 10000, by = 1000)
> > performance <- rbind(get_performance("explicity environment", ns, setup, evaluate(empty_env_queue())),
> > get_performance("closure environment", ns, setup, evaluate(empty_closure_queue())),
> > get_performance("functional queue", ns, setup, evaluate(empty_extended_queue())))
> >
> > ggplot(performance, aes(x = as.factor(n), y = time / n, fill = algo)) +
> > geom_boxplot() +
> > scale_fill_grey("Data structure") +
> > xlab(quote(n)) + ylab(expression(Time / n)) + theme_minimal()
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org (mailto:R-help at r-project.org) mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list