[R] Simulating mid-points from a defined range
Bert Gunter
bgunter@4567 @end|ng |rom gm@||@com
Fri Jun 6 07:09:01 CEST 2025
... and here is a a simple 2-liner without a sort that I think is linear in
time and space (but please correct if this is wrong):
x <- cumsum(runif(10))
x/x[10] * runif(1, 0, 55) + seq.int(0, 45,5)
Question: Does this give the same distribution as Peter's method using the
order statistics? I believe yes, but someone more statistically competent
than me needs to verify or correct this.
Cheers,
Bert
On Thu, Jun 5, 2025 at 5:19 AM Bert Gunter <bgunter.4567 using gmail.com> wrote:
>
> Richard:
>
> "The "use an upper bound of 100 - (n+1)*5" and then "add back
> cumsum(rep(5,n)) at the end" (or equivalent) trick ensures the gaps
> are right but does nothing about the distribution.."
>
> If I understand you correctly, I think the above is wrong. Here is a
> one-line version of Peter's code for the original problem:
>
> sort(runif(10, 0, 55)) + seq.int(0, 45, 5)
>
> AFAICS this is a bijection between the order statistics of runif(10, 0,
> 55) and those of runif(10, 0, 100) with diffs >5. Please correct if I am
> wrong or I have misunderstood.
>
> Your other remarks about space and time complexity are correct, of course.
>
> Cheers,
> Bert
>
>
>
>
> On Wed, Jun 4, 2025 at 11:04 PM Richard O'Keefe <raoknz using gmail.com> wrote:
>
>> The Bentley and Saxe paper answers the questions
>> (1) How can I draw a sample of n numbers in the range 0..1 such that
>> the numbers are returned in increasing order WITHOUT sorting and so
>> that all such sequences are equally likely in LINEAR time. That's the
>> code I showed in R.
>> (2) How can I do this in a single pass?
>> (3) How can I return the elements one at a time, generating each one
>> on demand (O(1) space).
>>
>> The cumsum(runif(n))/n trick gets a sequence in linear time but the
>> distribution is different.
>> The sort(runif(n)) trick gets a sequence in O(n.log n) time but the
>> distribution is different again.
>>
>> The "use an upper bound of 100 - (n+1)*5" and then "add back
>> cumsum(rep(5,n)) at the end" (or equivalent) trick ensures the gaps
>> are right but does nothing about the distribution..
>>
>> By the way, since you want numbers with 2 decimal places for some
>> reason I don't understand,
>> perhaps the simplest trick of all is
>>
>> n <- 10 # how many numbers you want
>> L <- 0 # lower bound of the range
>> U <- 100 # upper bound of the range
>> G <- 5 # gap size
>> V <- U - G*(n+1) # reduced upper bound
>> x <- sort(sample((L*100):(V*100), size = n)) + cumsum(rep(G,
>> times=n))/100
>>
>> I hope this is clear.
>>
>>
>>
>> On Thu, 5 Jun 2025 at 00:56, Brian Smith <briansmith199312 using gmail.com>
>> wrote:
>> >
>> > Okay, I think I found the reason. This is due to accumulation of nine
>> > 5s in the cumsum. Thanks again for the elegant solution.
>> >
>> > But I wonder, if the solution is simple then what is the significance
>> > of the Research paper by Bentley and Saxe naming “Generating sorted
>> > lists of random numbers” which Richard mentioned?
>> >
>> > On Wed, 4 Jun 2025 at 17:54, Brian Smith <briansmith199312 using gmail.com>
>> wrote:
>> > >
>> > > Hi Peter,
>> > >
>> > > Could you please help me to understand what is the basis of choosing
>> > > 55 in runif(10,0,55))?
>> > >
>> > > Thank you!
>> > >
>> > > On Wed, 4 Jun 2025 at 02:45, peter dalgaard <pdalgd using gmail.com> wrote:
>> > > >
>> > > > Can't you just generate 10 values in (0,55), sort them, generate
>> the distances, add 5 and cumulate?
>> > > >
>> > > > > x <- sort(runif(10,0,55))
>> > > > > d <- diff(x)+5
>> > > > > cumsum(c(x[1],d))
>> > > > [1] 12.27815 21.21060 26.37856 36.03812 41.97237 57.02945 67.86113
>> > > > [8] 75.74085 81.28533 98.30792
>> > > >
>> > > >
>> > > > > On 3 Jun 2025, at 09.21, Brian Smith <briansmith199312 using gmail.com>
>> wrote:
>> > > > >
>> > > > > Hi Richard,
>> > > > >
>> > > > > Thanks for your insight.
>> > > > >
>> > > > > As I mentioned in one of my earlier emails to the group, I
>> imposed a
>> > > > > constraint of accuracy up to two decimal places in order to
>> obtain a
>> > > > > finite set of possible values. For instance, if I were to round
>> values
>> > > > > to zero decimal places, the number of unique sequences that could
>> be
>> > > > > generated would be strictly finite and quite limited. Therefore, I
>> > > > > chose a precision of two decimal places to allow for a larger but
>> > > > > still finite number of possibilities.
>> > > > >
>> > > > >
>> > > > > Now, my question is: how can this accuracy constraint be imposed
>> effectively?
>> > > > >
>> > > > > Is the only practical method to generate samples, round each to
>> two
>> > > > > decimal places, and then check for duplicates to ensure
>> uniqueness? If
>> > > > > so, I’m concerned this might be inefficient, as many samples
>> could be
>> > > > > discarded, making the process time-consuming.
>> > > > >
>> > > > > Is there a better or more efficient way to directly enforce this
>> > > > > constraint while generating the values?
>> > > > >
>> > > > > ________________________________
>> > > > >
>> > > > > Additionally, could you please elaborate on your suggestion
>> regarding
>> > > > > imposing minimum gap constraints by subtracting and then adding
>> back
>> > > > > certain gaps?
>> > > > >
>> > > > >
>> > > > > For example, based on your earlier guidance, one possible
>> sequence I
>> > > > > obtained is:
>> > > > >
>> > > > >
>> > > > > 10.07181, 14.49839, 14.74435, 18.75167, 42.70361, 55.79623,
>> 63.40264,
>> > > > > 68.62261, 92.49899, 98.29308
>> > > > >
>> > > > >
>> > > > > Now, I’d like to post-process this sequence to enforce a minimum
>> > > > > difference constraint of, say, 5 units between values (including
>> both
>> > > > > lower and upper bounds).
>> > > > >
>> > > > > What would be the appropriate way to modify the sequence to impose
>> > > > > this kind of constraint?
>> > > > >
>> > > > >
>> > > > > Many thanks for your time and insight.
>> > > > >
>> > > > > On Tue, 3 Jun 2025 at 10:42, Richard O'Keefe <raoknz using gmail.com>
>> wrote:
>> > > > >>
>> > > > >> PS I forgot about the weird gaps requirement.
>> > > > >> What you do is subtract the gaps off and then add them back. I
>> hope that is clear.
>> > > > >>
>> > > > >> On Sun, 1 Jun 2025 at 6:52 AM, Brian Smith <
>> briansmith199312 using gmail.com> wrote:
>> > > > >>>
>> > > > >>> Hi,
>> > > > >>>
>> > > > >>> Let say I have a range [0, 100]
>> > > > >>>
>> > > > >>> Now I need to simulate 1000 10 mid-points within the range with
>> > > > >>> accuracy upto second decimal number.
>> > > > >>>
>> > > > >>> Let say, one simulated set is
>> > > > >>>
>> > > > >>> X1, X2, ..., X10
>> > > > >>>
>> > > > >>> Ofcourrse
>> > > > >>>
>> > > > >>> X1 < X2 < ... <X10
>> > > > >>>
>> > > > >>> I have one more constraint that the difference between any 2
>> > > > >>> consecutive mid-points shall be at-least 5.00.
>> > > > >>>
>> > > > >>> I wonder if there is any Statistical theory available to
>> support this
>> > > > >>> kind of simulation.
>> > > > >>>
>> > > > >>> Alternately, is there any way in R to implement this?
>> > > > >>>
>> > > > >>> ______________________________________________
>> > > > >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
>> see
>> > > > >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> > > > >>> PLEASE do read the posting guide
>> https://www.R-project.org/posting-guide.html
>> > > > >>> and provide commented, minimal, self-contained, reproducible
>> code.
>> > > > >
>> > > > > ______________________________________________
>> > > > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > > > PLEASE do read the posting guide
>> https://www.R-project.org/posting-guide.html
>> > > > > and provide commented, minimal, self-contained, reproducible code.
>> > > >
>> > > > --
>> > > > Peter Dalgaard, Professor,
>> > > > Center for Statistics, Copenhagen Business SchoolSolbjerg Plads 3,
>> 2000 Frederiksberg, Denmark
>> > > > Phone: (+45)38153501
>> > > > Office: A 4.23
>> > > > Email: pd.mes using cbs.dk Priv: PDalgd using gmail.com
>> > > >
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> https://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list