[R] elimination duplicate elements sampling!
Brian Diggs
diggsb at ohsu.edu
Tue Jul 12 19:25:30 CEST 2011
On 7/7/2011 3:23 PM, elephann wrote:
> Hi everyone!
> I have a data frame with 1112 time series and I am going to randomly
> sampling r samples for z times to compose different portfolio size(r
> securities portfolio). As for r=2 and z=10000,that's:
> z=10000
> A=seq(1:1112)
> x1=sample(A,z,replace =TRUE)
> x2=sample(A,z,replace =TRUE)
> M=cbind(x1,x2) # combination of 2 series
> Because in a portfolio with x1[i]=x2[i],(i=1,2,...,10000) means a 1
> securities' portfolio,not 2 securities',it should be eliminated and
> resampling. With r increase, for example r=k, how do I efficiently
> eliminated all such portfolio as x1[i]=x2[i]=...=xk[i]?
Why not sample without replacement the r portfolios, and replicate that
z times?
z <- 10000 # number of replicates
r <- 2 # number in each replicate
A <- 1:1112 # space to sample from
M <- t(replicate(z, sample(A, r)))
> Besides, any r securities' portfolio with the same securities' combination
> means the same portfolio(given same weights as here), e.g.
> M(x1[i],x5[i],x7[i],x1000[i]) and M(x5[i],x7[i],x1[i],x1000[i]) or
> M(x1[i],x7[i],x5[i],x1000[i]) are the same, how do I efficiently eliminat
> these possibilities?
Do you mean you don't want any of the replicates to be the same? You
can eliminate duplicates
M <- t(replicate(z, sort(sample(A, r))))
M <- M[!duplicated(M),]
Or you can create all possible portfolios of size r, and sample z from
that without replacement to do it in one pass.
cmb <- t(combn(A, r))
M <- cmb[sample(nrow(cmb), z),]
Note this is not practical for r > 2. cmb is an array of size r by
choose(length(A), r) (which is 2 x 617716 in this case). In fact, for r
> 3, this won't even work with the 1112 sample space. For r = 3, cmb
is 3 x 228554920. But for the three portfolio case, the probability of
getting a duplicate portfolio is small.
Better is to sample a few extra so that you still have sufficient after
throwing out duplicates
M <- t(replicate(1.01*z, sort(sample(A, r))))
M <- M[!duplicated(M),][1:z,]
The 1.01 multiplier may not be big enough; there is no multiplier that
will guarantee that you will have z samples when you are done. Although
the second line will throw an error if there are not z unique samples,
so it may be easier to pick up.
> --
> View this message in context: http://r.789695.n4.nabble.com/elimination-duplicate-elements-sampling-tp3652791p3652791.html
> Sent from the R help mailing list archive at Nabble.com.
--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University
More information about the R-help
mailing list