[Rd] A bug in the R Mersenne Twister (RNG) code?

Martin Maechler maechler at stat.math.ethz.ch
Tue Sep 6 08:49:06 CEST 2016


>>>>> Gabriel Becker <gmbecker at ucdavis.edu>
>>>>>     on Thu, 1 Sep 2016 08:34:31 -0700 writes:

    > I wonder how useful a (set of?) "time machine" functions
    > which look up /infer things like this based on a date
    > would be. Could ease the pain of changes generally, though
    > not remove it completely.

Such a set (possibly of size one) may be quite useful, notably
if it got an intuitive interface.
I'd recommend to partly follow options() here, i.e., the
  oc <- compatibilityR("2000-02-29")

would set random number generators (and other changeable
defaults) to those that were in effect when R 1.0.0 was released,
*and* a later call

  compatibilityR (oc)  # reset to previous state

would do what the comment says.


    > On Wed, Aug 31, 2016 at 5:45 PM, Paul Gilbert
    > <pgilbert902 at gmail.com> wrote:

    >> 
    >> 
    >> On 08/30/2016 06:29 PM, Duncan Murdoch wrote:
    >> 
    >>> I don't see evidence of a bug.  There have been several
    >>> versions of the MT; we may be using a different version
    >>> than you are.  Ours is the 1999/10/28 version; the web
    >>> page you cite uses one from 2002.
    >>> 
    >>> Perhaps the newer version fixes some problems, and then
    >>> it would be worth considering a change.  But changing
    >>> the default RNG definitely introduces problems in
    >>> reproducibility,
    >>> 
    >> 
    >> Well "problems in reproducibility" is a bit
    >> vague. Results would always be reproducible by specifying
    >> kind="Mersenne-Twister" or kind="Buggy Kinderman-Ramage"
    >> for older results, so there is no problem reproducing
    >> results. The only problem is that users expecting to
    >> reproduce results twenty years later will need to know
    >> what random generator they used. (BTW, they may also need
    >> to record information about the normal or other
    >> generator, as well as the seed.) Of course, these changes
    >> are recorded pretty well for R, so the history of
    >> "default" can always be found.
    >> 
    >> I think it is a mistake to encourage users into thinking
    >> they do not need to keep track of some information if
    >> they want reproducibility. Perhaps the default should be
    >> changed more often in order to encourage better user
    >> habits.
    >> 
    >> More seriously, I think "default" should continue to be
    >> something that is currently considered to be good. So, if
    >> there really is a known problem, then I think "default"
    >> should be changed.
    >> 
    >> (And, no I did not get burned by the R 1.7.0 change in
    >> the default generator. I got burned by a much earlier,
    >> unadvertised, and more subtle change in the Splus
    >> generator.)
    >> 
    >> Paul Gilbert
    >> 
    >> 
    >> so it's not obvious that we
    >> 
    >>> would do it.
    >>> 
    >>> Duncan Murdoch
    >>> 
    >>> 
    >>> On 30/08/2016 5:45 PM, Mark Roberts wrote:
    >>> 
    >>>> Whomever,
    >>>> 
    >>>> I recently sent the "bug report" below
    >>>> toR-core at r-project.org and have just been asked to
    >>>> instead submit it to you.
    >>>> 
    >>>> Although I am basically not an R user, I have installed
    >>>> version 3.3.1 and am also the author of a statistics
    >>>> program written in Visual Basic that contains a
    >>>> component which correctly implements the Mersenne
    >>>> Twister (MT) algorithm.  I believe that it is not
    >>>> possible to generate the correct stream of pseudorandom
    >>>> numbers using the MT default random number generator in
    >>>> R, and am not the first person to notice this.  Here is
    >>>> a posted 2013 entry
    >>>> (www.r-bloggers.com/reproducibility-and-randomness/) on
    >>>> an R website that asserts that the SAS computer program
    >>>> implementation of the MT algorithm produces different
    >>>> numbers than R does when using the same starting seed
    >>>> number.  The author of this post didn’t get anyone to
    >>>> respond to his query about the reason for this SAS
    >>>> vs. R discrepancy.
    >>>> 
    >>>> There are two ways of initializing the original MT
    >>>> computer program (written in C) so that an identical
    >>>> stream of numbers can be repeatedly generated: 1) with
    >>>> a particular integer seed number, and 2) with a
    >>>> particular array of integers.  In the 'compilation and
    >>>> usage' section of this webpage
    >>>> (https://github.com/cslarsen/mersenne-twister) there is
    >>>> a listing of the first 200 random numbers the MT
    >>>> algorithm should produce for seed number = 1.  The
    >>>> inventors of the Mersenne Twister random number
    >>>> generator provided two different sets of the first 1000
    >>>> numbers produced by a correctly coded 32-bit
    >>>> implementation of the MT algorithm when initializing it
    >>>> with a particular array of integers at:
    >>>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.out.
    >>>> [There is a link to this output at:
    >>>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.]
    >>>> 
    >>>> My statistics program obtains exactly those 200 numbers
    >>>> from the first site mentioned in the previous paragraph
    >>>> and also obtains those same numbers from the second
    >>>> website (though I didn't check all 2000 values).
    >>>> Assuming that the MT code within R uses the 32-bit MT
    >>>> algorithm, I suspect that the current version of R
    >>>> can't do that.  If you (i.e., anyone who might
    >>>> knowledgeably respond to this report) is able to
    >>>> duplicate those reference test-values, then please send
    >>>> me the R code to initialize the MT code within R to
    >>>> successfully do that, and I apologize for having wasted
    >>>> your time. If you (collectively) can't do that, then R
    >>>> is very likely using incorrectly implemented MT code.
    >>>> And if this latter possibility is true, it seems to me
    >>>> that this is something that should be fixed.
    >>>> 
    >>>> Mark Roberts, Ph.D.
    >>>> 
    >>>> [[alternative HTML version deleted]]
    >>>> 
    >>>> ______________________________________________
    >>>> R-devel at r-project.org mailing list
    >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>> 
    >>>> 
    >>> ______________________________________________
    >>> R-devel at r-project.org mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>> 
    >> 
    >> ______________________________________________
    >> R-devel at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >> 



    > -- 
    > Gabriel Becker, PhD Associate Scientist (Bioinformatics)
    > Genentech Research

    > 	[[alternative HTML version deleted]]

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list