[R] Wichmann-Hill Random Number Generator and the Birthday Problem

Shengqiao Li shli at stat.wvu.edu
Tue Aug 19 14:59:11 CEST 2008


In fact, that is what I saw in your RNG help in R 2.7.1:

  '"Wichmann-Hill"' The seed, '.Random.seed[-1] == r[1:3]' is an
           integer vector of length 3, where each 'r[i]' is in '1:(p[i]
           - 1)', where 'p' is the length 3 vector of primes, 'p =
           (30269, 30307, 30323)'. The Wichmann-Hill generator has a
           cycle length of 6.9536e12 (= 'prod(p-1)/4', see _Applied
           Statistics_ (1984) *33*, 123 which corrects the original
           article).

You have to admit that Wichmann-Hill's correction appeared here! If it's correct, then 2.8e13, 
which is claimed by Duncan Murdoch in previous email, is wrong!


Shengqiao

On Tue, 19 Aug 2008, Prof Brian Ripley wrote:

> Please don't lie!  You falsely claimed:
>
>> In R version 2.7.1, Brian Ripley adopted Wichmann's 1984 correction in RNG 
>> document.
>
> The period in the help has been unchanged since at least Jan 2000 (since 
> Random.Rd has not had any changes to those lines since then).
>
> Your coincidence calculations may be correct for _independent_ draws from a 
> discrete distribution on M values, but independence is not satisfied.
> Yet again, you are trying to do things that any good text on simulation would 
> warn you against, and which (in a thread on R-devel) you have already been 
> told a good way to do.
>
> On Sun, 17 Aug 2008, Shengqiao Li wrote:
>
>> 
>> On Sun, 17 Aug 2008, Duncan Murdoch wrote:
>> 
>>> Shengqiao Li wrote:
>>>> Dear all,
>>>> 
>>>> Recently I am generating large random samples (10M) and any duplicated 
>>>> numbers are not desired.
>>>> We tried several RNGs in R and found Wichmann-Hill did not produce 
>>>> duplications.
>>>> 
>>>> The duplication problem is the interesting birthday problem. If there are 
>>>> M possible numbers, randomly draw N numbers from them,
>>>> the average number of dupilcations D = N(N-1)/2/M.
>>>> 
>>>> For Knuth-TAOCP and Knuth-TAOCP-2002, M=2^30, since this modulus is used. 
>>>> D = 46566.12 for N=10M samples.
>>>> 
>>>> For Marsaglia-Multicarry, Super-Duper and Mersene-Twister, M=2^32. D = 
>>>> 11641.53 for N = 10M samples.
>>>> 
>>>> My testing results (see below) agree with above analysis. But for 
>>>> Wichmann-Hill, it wasn't. Wichmann-Hill's cycle is 6.9536e12 (refer to 
>>>> RNG help by ?RNG and Whichmann's correction in 1984). Thus M <= 
>>>> 6.9536e12. D 
>>>>> = 7.19052 when N=1e7 and D>= 179.763 for N=5e7. 
>>>> But in my tests, duplications were surprisingly not observed.
>>>> 
>>>> It seems that Wichmann-Hill has a much larger cycle than the one 
>>>> documented!
>>>> 
>>>> Anybody can solve this puzzle?
>>>> 
>>> 
>>> As I told you, look at the code. The cycle length of Wichmann-Hill in R 
>>> appears to be 2.8e13, and that also appears to be M in your notation. 
>>> Your birthday problem calculation does not apply here.
>>> You could probably get a good approximation to it by rounding the 
>>> Wichmann-Hill output (e.g. look at round(2^30 * runif()), or maybe more 
>>> severe rounding).
>>> 
>>> M is larger than what was previously documented, but Brian Ripley has 
>>> corrected the documentation.
>> 
>> Wichmann claimed 2.78X10^13 in his 1982 original paper. He made correction 
>> in 1984, "The period of the generator was incorrectly given as 2.78 X 
>> 10^13. In fact its period is only a quarter of that value: 6.95X10^12." In 
>> R version 2.7.1, Brian Ripley adopted Wichmann's 1984 correction in RNG 
>> document. That is the currently documented cycle is 6.95X10^12 not 
>> 2.78X10^13 nor your approxmation 2.8e13.
>> 
>> So, is the cycle claimed in 1982 correct or the one claimed in 1984?
>> 
>> Even if the larger one 2.78e13 claimed in 1982 is correct,  around 45 
>> duplications were expected for 50M samples. But I got 0.  My testing method 
>> is based on the example code in the last third line in RNG help.
>> 
>> Shengqiao Li
>> 
>>> 
>>> Duncan Murdoch
>>>> Regards,
>>>> 
>>>> Shengqiao Li
>>>> 
>>>> Department of Statistics
>>>> West Virgina Unversity
>>>> 
>>>> ==============Testing===================
>>>> 
>>>> 
>>>>> RNGkind(kind="Knuth-TAOCP");
>>>>> RNGkind();
>>>>> 
>>>> [1] "Knuth-TAOCP" "Inversion"
>>>> 
>>>>> sum(duplicated(runif(1e7)));
>>>>> 
>>>> [1] 46216
>>>> 
>>>>> RNGkind(kind="Knuth-TAOCP-2002");
>>>>> RNGkind();
>>>>> 
>>>> [1] "Knuth-TAOCP-2002" "Inversion"
>>>> 
>>>>> sum(duplicated(runif(1e7)));
>>>>> 
>>>> [1] 46373
>>>> 
>>>> 
>>>>> RNGkind(kind="Marsaglia-Multicarry");
>>>>> RNGkind();
>>>>> 
>>>> [1] "Marsaglia-Multicarry" "Inversion"
>>>> 
>>>>> sum(duplicated(runif(1e7)));
>>>>> 
>>>> [1] 11671
>>>> 
>>>>> RNGkind(kind="Super-Duper");
>>>>> RNGkind();
>>>>> 
>>>> [1] "Super-Duper" "Inversion"
>>>> 
>>>>> sum(duplicated(runif(1e7)));
>>>>> 
>>>> [1] 11704
>>>> 
>>>>> RNGkind(kind="Mersenne-Twister");
>>>>> RNGkind();
>>>>> 
>>>> [1] "Mersenne-Twister" "Inversion"
>>>> 
>>>>> sum(duplicated(runif(1e7)));
>>>>> 
>>>> [1] 11508
>>>> 
>>>>> RNGkind(kind="Wichmann-Hill");
>>>>> RNGkind();
>>>>> 
>>>> [1] "Wichmann-Hill" "Inversion"
>>>> 
>>>>> sum(duplicated(runif(1e7)));
>>>>> 
>>>> [1] 0
>>>> 
>>>> 
>>>>> gc()
>>>>> 
>>>>            used (Mb) gc trigger  (Mb)  max used   (Mb)
>>>> Ncells  199157  5.4     646480  17.3  10149018  271.1
>>>> Vcells 4497151 34.4  108714629 829.5 169585390 1293.9
>>>> 
>>>> 
>>>>> sum(duplicated(runif(5e7)))
>>>>> 
>>>> [1] 0
>>>> 
>>>> 
>>>> 
>>>>> sessionInfo()
>>>>> 
>>>> R version 2.7.1 (2008-06-23)
>>>> i386-pc-mingw32
>>>> 
>>>> locale:
>>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
>>>> States.1252;LC_MONETARY=English_United 
>>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>>> 
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>> 
>>>> loaded via a namespace (and not attached):
>>>> [1] tools_2.7.1
>>>> 
>>>> ==============End of Testing===================
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>> 
>>> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>



More information about the R-help mailing list