[Rd] A bug in the R Mersenne Twister (RNG) code?
Martin Maechler
maechler at stat.math.ethz.ch
Tue Sep 6 08:49:06 CEST 2016
>>>>> Gabriel Becker <gmbecker at ucdavis.edu>
>>>>> on Thu, 1 Sep 2016 08:34:31 -0700 writes:
> I wonder how useful a (set of?) "time machine" functions
> which look up /infer things like this based on a date
> would be. Could ease the pain of changes generally, though
> not remove it completely.
Such a set (possibly of size one) may be quite useful, notably
if it got an intuitive interface.
I'd recommend to partly follow options() here, i.e., the
oc <- compatibilityR("2000-02-29")
would set random number generators (and other changeable
defaults) to those that were in effect when R 1.0.0 was released,
*and* a later call
compatibilityR (oc) # reset to previous state
would do what the comment says.
> On Wed, Aug 31, 2016 at 5:45 PM, Paul Gilbert
> <pgilbert902 at gmail.com> wrote:
>>
>>
>> On 08/30/2016 06:29 PM, Duncan Murdoch wrote:
>>
>>> I don't see evidence of a bug. There have been several
>>> versions of the MT; we may be using a different version
>>> than you are. Ours is the 1999/10/28 version; the web
>>> page you cite uses one from 2002.
>>>
>>> Perhaps the newer version fixes some problems, and then
>>> it would be worth considering a change. But changing
>>> the default RNG definitely introduces problems in
>>> reproducibility,
>>>
>>
>> Well "problems in reproducibility" is a bit
>> vague. Results would always be reproducible by specifying
>> kind="Mersenne-Twister" or kind="Buggy Kinderman-Ramage"
>> for older results, so there is no problem reproducing
>> results. The only problem is that users expecting to
>> reproduce results twenty years later will need to know
>> what random generator they used. (BTW, they may also need
>> to record information about the normal or other
>> generator, as well as the seed.) Of course, these changes
>> are recorded pretty well for R, so the history of
>> "default" can always be found.
>>
>> I think it is a mistake to encourage users into thinking
>> they do not need to keep track of some information if
>> they want reproducibility. Perhaps the default should be
>> changed more often in order to encourage better user
>> habits.
>>
>> More seriously, I think "default" should continue to be
>> something that is currently considered to be good. So, if
>> there really is a known problem, then I think "default"
>> should be changed.
>>
>> (And, no I did not get burned by the R 1.7.0 change in
>> the default generator. I got burned by a much earlier,
>> unadvertised, and more subtle change in the Splus
>> generator.)
>>
>> Paul Gilbert
>>
>>
>> so it's not obvious that we
>>
>>> would do it.
>>>
>>> Duncan Murdoch
>>>
>>>
>>> On 30/08/2016 5:45 PM, Mark Roberts wrote:
>>>
>>>> Whomever,
>>>>
>>>> I recently sent the "bug report" below
>>>> toR-core at r-project.org and have just been asked to
>>>> instead submit it to you.
>>>>
>>>> Although I am basically not an R user, I have installed
>>>> version 3.3.1 and am also the author of a statistics
>>>> program written in Visual Basic that contains a
>>>> component which correctly implements the Mersenne
>>>> Twister (MT) algorithm. I believe that it is not
>>>> possible to generate the correct stream of pseudorandom
>>>> numbers using the MT default random number generator in
>>>> R, and am not the first person to notice this. Here is
>>>> a posted 2013 entry
>>>> (www.r-bloggers.com/reproducibility-and-randomness/) on
>>>> an R website that asserts that the SAS computer program
>>>> implementation of the MT algorithm produces different
>>>> numbers than R does when using the same starting seed
>>>> number. The author of this post didn’t get anyone to
>>>> respond to his query about the reason for this SAS
>>>> vs. R discrepancy.
>>>>
>>>> There are two ways of initializing the original MT
>>>> computer program (written in C) so that an identical
>>>> stream of numbers can be repeatedly generated: 1) with
>>>> a particular integer seed number, and 2) with a
>>>> particular array of integers. In the 'compilation and
>>>> usage' section of this webpage
>>>> (https://github.com/cslarsen/mersenne-twister) there is
>>>> a listing of the first 200 random numbers the MT
>>>> algorithm should produce for seed number = 1. The
>>>> inventors of the Mersenne Twister random number
>>>> generator provided two different sets of the first 1000
>>>> numbers produced by a correctly coded 32-bit
>>>> implementation of the MT algorithm when initializing it
>>>> with a particular array of integers at:
>>>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.out.
>>>> [There is a link to this output at:
>>>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.]
>>>>
>>>> My statistics program obtains exactly those 200 numbers
>>>> from the first site mentioned in the previous paragraph
>>>> and also obtains those same numbers from the second
>>>> website (though I didn't check all 2000 values).
>>>> Assuming that the MT code within R uses the 32-bit MT
>>>> algorithm, I suspect that the current version of R
>>>> can't do that. If you (i.e., anyone who might
>>>> knowledgeably respond to this report) is able to
>>>> duplicate those reference test-values, then please send
>>>> me the R code to initialize the MT code within R to
>>>> successfully do that, and I apologize for having wasted
>>>> your time. If you (collectively) can't do that, then R
>>>> is very likely using incorrectly implemented MT code.
>>>> And if this latter possibility is true, it seems to me
>>>> that this is something that should be fixed.
>>>>
>>>> Mark Roberts, Ph.D.
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> --
> Gabriel Becker, PhD Associate Scientist (Bioinformatics)
> Genentech Research
> [[alternative HTML version deleted]]
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list