[R] sample consecutive integers efficiently
Chris Oldmeadow
c.oldmeadow at student.qut.edu.au
Fri Aug 29 00:35:40 CEST 2008
Charles C. Berry wrote:
> On Thu, 28 Aug 2008, Chris Oldmeadow wrote:
>
>> Hi all,
>>
>> I have some rough code to sample consecutive integers with length
>> according to a vector of lengths
>>
>> #sample space (representing positions)
>> pos<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
>>
>> #sample lengths
>> lengths<-c(2,3,2)
>>
>> From these two vectors I need a vector of sampled positions.
>>
>>
>> the sampling is without replacement, making things tough as the
>> sampled integers need to be consecutive. Im hoping somebody knows a
>> faster way of doing it than I have. ATM its way to slow on large
>> vectors.
>>
>>
>> samplePos<-function(l){
>> start.pos<-sample(pos,1)
>> end.pos<-start.pos+l-1
>> posies<-start.pos:end.pos
>> posies
>> }
>>
>> s.start<-c()
>>
>>
>> newPos<-function(a){
>> rp<-samplePos(a)
>> #test sampled range is consecutive, if not resample
>> if (length(rp) != rp[a]+1 -rp[1]){rp<-samplePos(a)}
>> pos<-setdiff(pos,rp)
>> rp[1]
>> }
>>
>> newps<-c()
>> newps<-unlist(lapply(lengths,newPos))
>>
>> I think the bottleneck may be on the setdiff() function - the sample
>> space is quite large so I dont think there would be too many rejections.
>
>
> The bottleneck is in the formulation of the sampling scheme.
>
> This is a simple combinatorics problem. There are 3360 possible values
> ( prod(16:14) ) for the start positions of the three elements, and you
> can form a bijection between 1:3360 and the individual samples. If the
> number of possible sample is small enough, it would be most efficient
> to sample from the corresponding integer vector and then translate it
> to the corresponding sample. For larger values where the number of
> possible samples become a challenge for 32-bit integer arithmetic, I
> expect this approach would be preferred:
>
> Permute length ( pos ) - sum ( lengths ) + length( lengths ) distinct
> (consecutively labelled) elements:
>
> elz <- sample( length ( pos ) - sum ( lengths ) + length( lengths ) )
>
>
> Take the lengths of the original objects to be
>
> z.lens <- rep( 1, length( elz ) )
> z.lens[ seq(along = lengths ) ] <- lengths
>
> (i.e. objects longer than 1 appear first)
>
> Determine the start positions of the objects as if they were laid down
> consecutively according to the permutation:
>
> start <- head( cumsum( c(0, z.lens[ elz ]) ) + 1 , -1 )
>
> Find the start positions of just those with lengths greater than 1
>
> gt.1 <- match( seq(along=lengths) , elz )
>
> Report the start positions
>
> start[ gt.1 ]
>
> ---
>
> If length( pos ) is large, you can rewrite the above to simply sample
> the positions (in the ordering) of the objects with lengths greater
> than 1. You will have to revise the calculation of start and gt.1 in
> that case.
>
> HTH,
>
> Chuck
>
>
> >
>>
>>
>> Many thanks,
>> Chris
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Charles C. Berry (858) 534-2098
> Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego
> 92093-0901
>
>
>
Thanks! that worked perfectly.
Chris
More information about the R-help
mailing list