[R] Wichmann-Hill Random Number Generator and the Birthday Problem
Shengqiao Li
shli at stat.wvu.edu
Tue Aug 19 14:59:11 CEST 2008
In fact, that is what I saw in your RNG help in R 2.7.1:
'"Wichmann-Hill"' The seed, '.Random.seed[-1] == r[1:3]' is an
integer vector of length 3, where each 'r[i]' is in '1:(p[i]
- 1)', where 'p' is the length 3 vector of primes, 'p =
(30269, 30307, 30323)'. The Wichmann-Hill generator has a
cycle length of 6.9536e12 (= 'prod(p-1)/4', see _Applied
Statistics_ (1984) *33*, 123 which corrects the original
article).
You have to admit that Wichmann-Hill's correction appeared here! If it's correct, then 2.8e13,
which is claimed by Duncan Murdoch in previous email, is wrong!
Shengqiao
On Tue, 19 Aug 2008, Prof Brian Ripley wrote:
> Please don't lie! You falsely claimed:
>
>> In R version 2.7.1, Brian Ripley adopted Wichmann's 1984 correction in RNG
>> document.
>
> The period in the help has been unchanged since at least Jan 2000 (since
> Random.Rd has not had any changes to those lines since then).
>
> Your coincidence calculations may be correct for _independent_ draws from a
> discrete distribution on M values, but independence is not satisfied.
> Yet again, you are trying to do things that any good text on simulation would
> warn you against, and which (in a thread on R-devel) you have already been
> told a good way to do.
>
> On Sun, 17 Aug 2008, Shengqiao Li wrote:
>
>>
>> On Sun, 17 Aug 2008, Duncan Murdoch wrote:
>>
>>> Shengqiao Li wrote:
>>>> Dear all,
>>>>
>>>> Recently I am generating large random samples (10M) and any duplicated
>>>> numbers are not desired.
>>>> We tried several RNGs in R and found Wichmann-Hill did not produce
>>>> duplications.
>>>>
>>>> The duplication problem is the interesting birthday problem. If there are
>>>> M possible numbers, randomly draw N numbers from them,
>>>> the average number of dupilcations D = N(N-1)/2/M.
>>>>
>>>> For Knuth-TAOCP and Knuth-TAOCP-2002, M=2^30, since this modulus is used.
>>>> D = 46566.12 for N=10M samples.
>>>>
>>>> For Marsaglia-Multicarry, Super-Duper and Mersene-Twister, M=2^32. D =
>>>> 11641.53 for N = 10M samples.
>>>>
>>>> My testing results (see below) agree with above analysis. But for
>>>> Wichmann-Hill, it wasn't. Wichmann-Hill's cycle is 6.9536e12 (refer to
>>>> RNG help by ?RNG and Whichmann's correction in 1984). Thus M <=
>>>> 6.9536e12. D
>>>>> = 7.19052 when N=1e7 and D>= 179.763 for N=5e7.
>>>> But in my tests, duplications were surprisingly not observed.
>>>>
>>>> It seems that Wichmann-Hill has a much larger cycle than the one
>>>> documented!
>>>>
>>>> Anybody can solve this puzzle?
>>>>
>>>
>>> As I told you, look at the code. The cycle length of Wichmann-Hill in R
>>> appears to be 2.8e13, and that also appears to be M in your notation.
>>> Your birthday problem calculation does not apply here.
>>> You could probably get a good approximation to it by rounding the
>>> Wichmann-Hill output (e.g. look at round(2^30 * runif()), or maybe more
>>> severe rounding).
>>>
>>> M is larger than what was previously documented, but Brian Ripley has
>>> corrected the documentation.
>>
>> Wichmann claimed 2.78X10^13 in his 1982 original paper. He made correction
>> in 1984, "The period of the generator was incorrectly given as 2.78 X
>> 10^13. In fact its period is only a quarter of that value: 6.95X10^12." In
>> R version 2.7.1, Brian Ripley adopted Wichmann's 1984 correction in RNG
>> document. That is the currently documented cycle is 6.95X10^12 not
>> 2.78X10^13 nor your approxmation 2.8e13.
>>
>> So, is the cycle claimed in 1982 correct or the one claimed in 1984?
>>
>> Even if the larger one 2.78e13 claimed in 1982 is correct, around 45
>> duplications were expected for 50M samples. But I got 0. My testing method
>> is based on the example code in the last third line in RNG help.
>>
>> Shengqiao Li
>>
>>>
>>> Duncan Murdoch
>>>> Regards,
>>>>
>>>> Shengqiao Li
>>>>
>>>> Department of Statistics
>>>> West Virgina Unversity
>>>>
>>>> ==============Testing===================
>>>>
>>>>
>>>>> RNGkind(kind="Knuth-TAOCP");
>>>>> RNGkind();
>>>>>
>>>> [1] "Knuth-TAOCP" "Inversion"
>>>>
>>>>> sum(duplicated(runif(1e7)));
>>>>>
>>>> [1] 46216
>>>>
>>>>> RNGkind(kind="Knuth-TAOCP-2002");
>>>>> RNGkind();
>>>>>
>>>> [1] "Knuth-TAOCP-2002" "Inversion"
>>>>
>>>>> sum(duplicated(runif(1e7)));
>>>>>
>>>> [1] 46373
>>>>
>>>>
>>>>> RNGkind(kind="Marsaglia-Multicarry");
>>>>> RNGkind();
>>>>>
>>>> [1] "Marsaglia-Multicarry" "Inversion"
>>>>
>>>>> sum(duplicated(runif(1e7)));
>>>>>
>>>> [1] 11671
>>>>
>>>>> RNGkind(kind="Super-Duper");
>>>>> RNGkind();
>>>>>
>>>> [1] "Super-Duper" "Inversion"
>>>>
>>>>> sum(duplicated(runif(1e7)));
>>>>>
>>>> [1] 11704
>>>>
>>>>> RNGkind(kind="Mersenne-Twister");
>>>>> RNGkind();
>>>>>
>>>> [1] "Mersenne-Twister" "Inversion"
>>>>
>>>>> sum(duplicated(runif(1e7)));
>>>>>
>>>> [1] 11508
>>>>
>>>>> RNGkind(kind="Wichmann-Hill");
>>>>> RNGkind();
>>>>>
>>>> [1] "Wichmann-Hill" "Inversion"
>>>>
>>>>> sum(duplicated(runif(1e7)));
>>>>>
>>>> [1] 0
>>>>
>>>>
>>>>> gc()
>>>>>
>>>> used (Mb) gc trigger (Mb) max used (Mb)
>>>> Ncells 199157 5.4 646480 17.3 10149018 271.1
>>>> Vcells 4497151 34.4 108714629 829.5 169585390 1293.9
>>>>
>>>>
>>>>> sum(duplicated(runif(5e7)))
>>>>>
>>>> [1] 0
>>>>
>>>>
>>>>
>>>>> sessionInfo()
>>>>>
>>>> R version 2.7.1 (2008-06-23)
>>>> i386-pc-mingw32
>>>>
>>>> locale:
>>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>>>> States.1252;LC_MONETARY=English_United
>>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] tools_2.7.1
>>>>
>>>> ==============End of Testing===================
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
More information about the R-help
mailing list