[R] non-reproducible eigen() output with MKL
Viechtbauer, Wolfgang (NP)
wo||g@ng@v|echtb@uer @end|ng |rom m@@@tr|chtun|ver@|ty@n|
Mon May 5 11:25:05 CEST 2025
A relevant thread from a few years ago where this was discussed:
https://www.stat.math.ethz.ch/pipermail/r-help/2023-August/477904.html
I generally use:
export OPENBLAS_NUM_THREADS=1
export MKL_NUM_THREADS=1
since in my experience the biggest performance gains come from switching to OpenBLAS / MKL in the first place. Using their multithreading capabilities tends to yield diminishing gains in comparison.
For MKL, you could try:
export MKL_THREADING_LAYER=GNU
or
export MKL_THREADING_LAYER=sequential
when using multiple threads to see if this avoids the issue.
But if you are using explicit parallelization (as I often do), then you would want to avoid the multithreading in the math libs anyway.
Best,
Wolfgang
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Martin Maechler
> Sent: Monday, May 5, 2025 10:50
> To: smallepsilon <smallepsilon using proton.me>
> Cc: r-help using r-project.org; peter dalgaard <pdalgd using gmail.com>
> Subject: Re: [R] non-reproducible eigen() output with MKL
>
> >>>>> smallepsilon via R-help
> >>>>> on Sun, 04 May 2025 18:11:57 +0000 writes:
>
> > Peter, The eigenvalues are not identical(), but are
> > all.equal(). When n is 20, the crossproduct is
> > (numerically) a diagonal matrix with +-1 on the
> > diagonal. When n is 50, this is not the case, but that
> > could be an issue of nearly identical eigenvalues.
>
> > Is there no way within R to require that the sequence of
> > operations be the same for identical calls?
>
> As Peter Dalgaard mentioned, you have the problem of underlying
> system libraries that try to be fast and hence parallelize
> computations, notably in this (and man similar calls), MKL uses
> parallelized BLAS and/or LAPACK .. and that's what eigen() in R
> (and Julia, python, matlab, ..) all use.
>
> And parallelization is a killer of (strict / bit-level)
> reproducibility, as you have just experienced.
>
> If you know how to tell your OS / that you want only one
> parallel "thread" / process / ...
> you get back to reproducible (and slower) computations.
>
> > The problem arose originally in a package test in which I wanted to
> > verify that two ways of specifying something led to the
> > execution of exactly the same calculations. The best proxy
> > for this seemed to be to use identical() on the outputs,
> > but if the same line of code (that should not involve an
> > RNG) can lead to different results, this approach is
> > doomed, yes? It is not absolutely critical that the
> > outputs be identical(), but it would be much more
> > reassuring than all.equal().
>
> I understand and agree.
>
> When I first became aware of the irreprodicibility of parallel
> computations, only a few years ago, I was quite shocked and deillusionized..
>
> > Thanks, Jesse
>
> Martin Maechler
> ETH Zurich and R Core team
>
> > On Sunday, May 4th, 2025 at 12:27 PM, peter dalgaard
> > <pdalgd using gmail.com> wrote:
> >>
> >>
> >> Have you looked more closely at the differences?
> >> Eigenvectors are only determined up to a sign change, so
> >> different platforms often give results that differ by
> >> sign. If you use a multitasking numerical library, the
> >> same can happen within platform because the exact
> >> sequence of computations differs between calls.
> >>
> >> You could check
> >>
> >> a) that e1$values and e2$values are the same b) that the
> >> crossprod(e1$vectors, e2$vectors) is a diagonal matrix
> >> with 1 or -1 in the diagonal. (This might fail if you
> >> have eigenvalues that are almost identical, though.)
> >>
> >> -pd
> >>
> >> > On 4 May 2025, at 17.57, smallepsilon via R-help
> >> r-help using r-project.org wrote:
> >> >
> >> > I am using MKL with R 4.5.0 on Linux, and eigen() is
> >> producing different results with identical
> >> calls. Specifically, when I run the code below, I get
> >> "FALSE" from identical(). Just in case it had something
> >> to do with a random number generator, I put identical
> >> set.seed() calls immediately before each eigen() call,
> >> but that did not help. Has anyone seen this behavior
> >> before?
> >> >
> >> > (When n is 5, the identical() call almost always
> >> returns "TRUE". As n increases, the proportion of FALSE
> >> results increases, and it is nearly always FALSE when n
> >> is 50.)
> >> >
> >> > Jesse
> >> >
> >> > ***
> >> >
> >> > n <- 50 > set.seed(20250504) > Sigma <- rWishart(1, df
> >> = n, Sigma = diag(n))[,,1] > e1 <- eigen(Sigma) > e2 <-
> >> eigen(Sigma) > identical(e1, e2)
> >>
> >> --
> >> Peter Dalgaard, Professor, Center for Statistics,
> >> Copenhagen Business SchoolSolbjerg Plads 3, 2000
> >> Frederiksberg, Denmark Phone: (+45)38153501 Office: A
> >> 4.23 Email: pd.mes using cbs.dk Priv: PDalgd using gmail.com
More information about the R-help
mailing list