[R] function pointers?
jim holtman
jholtman at gmail.com
Thu Nov 23 17:57:08 CET 2017
I am replying to the first part of the question about the size of the
object. It is probably best to use the "object_size" function in the
"pryr" package:
‘object_size’ works similarly to ‘object.size’, but counts more
accurately and includes the size of environments. ‘compare_size’
makes it easy to compare the output of ‘object_size’ and
‘object.size’.
Here is what you get from the same code:
> N <- 10000
> closureList <- vector("list", N)
> nsize = sample(x = 1:100, size = N, replace = TRUE)
> for (i in seq_along(nsize)){
+ closureList[[i]] <- list(func = rnorm, n = nsize[i])
+ }
> format(object.size(closureList), units = "Mb")
[1] "22.4 Mb"
> pryr::compare_size(closureList)
base pryr
23520040 2241776
You will notice that you get back a size that is 10X smaller because it is
accounting for the shared space.
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Wed, Nov 22, 2017 at 11:29 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:
> We have a project that calls for the creation of a list of many
> distribution objects. Distributions can be of various types, with
> various parameters, but we ran into some problems. I started testing
> on a simple list of rnorm-based objects.
>
> I was a little surprised at the RAM storage requirements, here's an
> example:
>
> N <- 10000
> closureList <- vector("list", N)
> nsize = sample(x = 1:100, size = N, replace = TRUE)
> for (i in seq_along(nsize)){
> closureList[[i]] <- list(func = rnorm, n = nsize[i])
> }
> format(object.size(closureList), units = "Mb")
>
> Output says
> 22.4 MB
>
> I noticed that if I do not name the objects in the list, then the
> storage drops to 19.9 MB.
>
> That seemed like a lot of storage for a function's name. Why so much?
> My colleagues think the RAM use is high because this is a closure
> (hence closureList). I can't even convince myself it actually is a
> closure. The R source has
>
> rnorm <- function(n, mean=0, sd=1) .Call(C_rnorm, n, mean, sd)
>
> The storage holding 10000 copies of rnorm, but we really only need 1,
> which we can use in the objects.
>
> Thinking of this like C, I am looking to pass in a pointer to the
> function. I found my way to the idea of putting a function in an
> environment in order to pass it by reference:
>
> rnormPointer <- function(inputValue1, inputValue2){
> object <- new.env(parent=globalenv())
> object$distr <- inputValue1
> object$n <- inputValue2
> class(object) <- 'pointer'
> object
> }
>
> ## Experiment with that
> gg <- rnormPointer(rnorm, 33)
> gg$distr(gg$n)
>
> ptrList <- vector("list", N)
> for(i in seq_along(nsize)) {
> ptrList[[i]] <- rnormPointer(rnorm, nsize[i])
> }
> format(object.size(ptrList), units = "Mb")
>
> The required storage is reduced to 2.6 Mb. Thats 1/10 of the RAM
> required for closureList. This thing works in the way I expect
>
> ## can pass in the unnamed arguments for n, mean and sd here
> ptrList[[1]]$distr(33, 100, 10)
> ## Or the named arguments
> ptrList[[1]]$distr(1, sd = 100)
>
> This environment trick mostly works, so far as I can see, but I have
> these questions.
>
> 1. Is the object.size() return accurate for ptrList? Do I really
> reduce storage to that amount, or is the required storage someplace
> else (in the new environment) that is not included in object.size()?
>
> 2. Am I running with scissors here? Unexpected bad things await?
>
> 3. Why is the storage for closureList so great? It looks to me like
> rnorm is just this little thing:
>
> function (n, mean = 0, sd = 1)
> .Call(C_rnorm, n, mean, sd)
> <bytecode: 0x55cc9988cae0>
>
> 4. Could I learn (you show me?) to store the bytecode address as a
> thing and use it in the objects? I'd guess that is the fastest
> possible way. In an Objective-C problem in the olden days, we found
> the method-lookup was a major slowdown and one of the programmers
> showed us how to save the lookup and use it over and over.
>
> pj
>
>
>
> --
> Paul E. Johnson http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis
> http://crmda.ku.edu
>
> To write to me directly, please address me at pauljohn at ku.edu.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list