[R] Drawing a sample based on certain condition

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Mon Apr 14 17:36:54 CEST 2025


On 2025-04-14 7:26 a.m., Brian Smith wrote:
> Hi,
> 
> For my analytical work, I need to draw a sample of certain sample size
> from a denied population, where population members are marked by
> non-negative integers, such that sum of sample members if fixed. For
> example,
> 
> Population = 0:100
> Sample_size = 10
> Sample_Sum = 20
> 
> Under this setup if my sample members are X1, X2, ..., X10 then I
> should have X1+X2+...+X10 = 20
> 
> Sample drawing scheme may be with/without replacement
> 
> Is there any R function to achieve this? One possibility is to employ
> naive trial-error approach, but this doesnt seem to be practical as it
> would take long time to get the final sample with desired properties.
> 
> Any pointer would be greatly appreciated.

One general way to think of this problem is that you are defining a 
distribution on the space of all possible samples of size 10, such that 
the probability of a sample is X if the sum is 20, and zero otherwise, 
and you want to sample from this distribution.

There's probably a slick method to do that for your example, but if 
you've got a general population instead of that special one, I doubt it.

What I would do is the following:

Define another distribution on samples that has probabilities that 
depend on the sum of the sample, with the highest probabilities attached 
to ones with the correct sum, and probabilities for other sums declining 
with distance from the sum.  For example, maybe

  P(sum) = Y/(1 + abs(sum - 20))

for some constant Y.

You can use MCMC to sample from that distribution and then only keep the 
samples where the sum is exactly equal to the target sum.  If you do 
that, you don't need to care about the value of Y. but you do need to 
think about how proposed moves are made, and you probably need to use a 
different function than the example above for acceptable efficiency.

Duncan Murdoch



More information about the R-help mailing list