[R] Non-reproducible LDA results across machines

Fri Oct 3 14:50:41 CEST 2025

Also, it is a bad idea to make your randomized analyses dependent on bit-for-bit reproducibility... first, different computer architectures handle floating point intermediate calculations differently, and second the whole point of a randomized trial is that it converges to some quantifiable mean result regardless of the path taken to get there.

On October 3, 2025 2:57:46 AM PDT, Jeanne Moreau <moreaujeanne02 using gmail.com> wrote:
>Good Morning,
>
>I am working with LDA models in R (using both topicmodels::LDA and
>quanteda::textmodel_lda) and noticed that the results differ slightly
>across different machines, even when I use set.seed(1234) and the same
>dataset.
>
>So, I have a few questions:
>Is this expected due to BLAS/LAPACK or low-level random number generation
>differences?
>Is there a recommended way to enforce bit-for-bit reproducibility of LDA
>results across machines in R?
>Would you recommend always saving fitted models with saveRDS() to ensure
>reproducible outputs instead of re-fitting?
>
>Thanks a lot for your guidance.
>
>Best regards,
>
>Jeanne Moreau
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.