[Rd] Pre-allocating serialization memory buffers
Florian Rupprecht
||oruppr @end|ng |rom gm@||@com
Mon Oct 10 16:00:59 CEST 2022
Hi all,
While investigating the performance of different hashing algorithms of the
"digest" package, I found that serialization to memory buffers via
serialize(obj, connection=NULL) was suspiciously slow for large objects.
After looking into the R source I found that the memory buffer grows
approx. n -> 2(n+1) but are not pre-allocated in any way. I then created a
minimal demo package (https://github.com/nx10/serialize_prealloc) with
different modified versions of the serialization mechanisms that let me
trace memory allocations and pre-allocate the buffer using
object.size(obj).
Benchmarking this shows that there is no apparent performance decrease for
small or deeply nested objects and approx. logarithmic gains with bigger
objects (more than 3 times faster on my machine with ~1GB large objects).
Benchmarks are included in the README of the demo package.
I have not done any tests with other kinds of streams such as file
connections, as I am not sure we can make assumptions about the
implementation of streams that are created elsewhere.
I would be happy to provide a patch for review if this is something you
consider worth investigating, but I would need some pointers on how you
would want this to be implemented, as object.size lives in the utils
library and serialize is in src\main\. (E.g. copy a non-error version of
objectsize to main\)
Best,
Florian
[[alternative HTML version deleted]]
More information about the R-devel
mailing list