[R] Overnight Cluster (Whitepaper)?

Wed Apr 30 16:22:26 CEST 2025

   HTCondor has been around for a long time (originally as "Condor", 
started in 1988!)

https://github.com/htcondor/htcondor
https://htcondor.org/
https://en.wikipedia.org/wiki/HTCondor

   I have no idea about the scale of difficulty of setting this up.  The 
developers do offer contract support <https://htcondor.org/uw-support/>

On 2025-04-30 10:12 a.m., Ivan Krylov via R-help wrote:
> Dear Ivo Welch,
> 
> Sorry for not answering the question you asked (I don't know such a
> vendor), but here are a few comments that may help:
> 
> On Tue, 29 Apr 2025 17:20:25 -0700
> ivo welch <ivo.welch using ucla.edu> wrote:
> 
>> These computers are mostly idle overnight.  We have no interest in
>> bitmining and SETI using home doesn't seem so very active any more, either.
>> Alas, it's 2025 now, so maybe there is something better we could do
>> with all this idle compute power when it comes to our own statistical
>> analyses. Maybe we could cluster them overnight.
> 
> The state of the art in volunteer computing is still BOINC, the same
> system that powers most of the "@home" projects. It lets the user
> control when to run the jobs and when to stop (e.g. run jobs overnight
> but only if the system is not under load by something else) and doesn't
> require the job submitter to be able to log in to the worker nodes or
> even rely on the nodes being able to accept incoming connections.
> 
> It's possible to run a BOINC server yourself [1], although the server
> side will take some work to set up, and the jobs need to be specially
> packaged. In theory, one could package R as a BOINC app and arrange for
> it to run jobs serialized into *.rds files, but it's a lot of
> infrastructure work to place all the moving parts in correct positions
> (package versions alone are a serious problem with no easy solution).
> 
>> Ideally, we would then have a frontend R (controller) that could run
>> `mclapply` statements on this Franken-computer, and be smart enough
>> about how to distribute the load.
> 
> One problem with parLapply() is that it expects the cluster object to
> be a list containing a fixed number of node objects. I've experimented
> with a similar problem: I needed to distribute jobs between my
> colleagues' workstations when they could spare some CPU power, letting
> computers leave and rejoin the cluster at will. In the end, I had to
> pretend that my 'parallel' cluster always contained an excessive number
> of nodes (128) and distribute the larger number of smaller sub-tasks
> dynamically.
> 
> A general-purpose interface for a volunteer cluster will probably not
> work as a drop-in replacement for mclapply(). You might be able to
> achieve part of what you want using 'mirai', telling every worker node
> to connect to the client node for tasks. BOINC can set memory and CPU
> core limits, but it might be unable to save you from inefficient job
> plans. See 'future.batchtools' for an example of an R interface for
> cluster job submission systems.
> 

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
 > E-mail is sent at my convenience; I don't expect replies outside of 
working hours.