[R] Generate groups with random size but given total sample size
Arne Schulz
arne.schulz at student.uni-kassel.de
Thu Jul 15 10:42:56 CEST 2010
Hi,
thanks a lot! That did it!
Regards,
Arne Schulz
> -----Ursprüngliche Nachricht-----
> Von: Greg Snow [mailto:Greg.Snow at imail.org]
> Gesendet: Dienstag, 13. Juli 2010 18:17
> An: Arne Schulz; r-help at r-project.org
> Betreff: RE: [R] Generate groups with random size but given total sample size
>
> For one definition of random:
>
> ss <- rexp(100)
> ss <- ss/sum(ss)
>
> ss <- 5 + round( ss*9500 )
>
> cnt <- 0
> while( ( d <- sum(ss) - 10000 ) != 0 ) {
>
> tmpid <- sample.int(100,1)
> ss[tmpid] <- ss[tmpid] - d
>
> ss[ ss > 500 ] <- 500
> ss[ ss < 5 ] <- 5
>
> cnt <- cnt + 1
> if (cnt > 100) {
> cat('problems finding a solution, stopping after 100 iterations\n')
> break
> }
> }
>
> group <- rep( 1:100, ss )
>
>
> Hope this helps,
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org] On Behalf Of Arne Schulz
> > Sent: Tuesday, July 13, 2010 7:10 AM
> > To: r-help at r-project.org
> > Subject: [R] Generate groups with random size but given total sample
> > size
> >
> > Dear list,
> > I am currently doing some simulation studies where I want to compare
> > different scenarios.
> > In particular, two scenarios should be compared: 10.000 cases in 100
> > groups with 100 cases per group and 10.000 cases in 100 groups with
> > random group size (ranging from 5 to 500).
> >
> > The first part is no problem:
> > > id <- seq(1,10000)
> > > group <- sort(rep(seq(1,100),100))
> >
> > But I don't get along with the second scenario. Using sample does give
> > me 100 groups with random cases, but generates more than 10.000 cases:
> > > set.seed(13)
> > > sum(sample(5:500, 100))
> > [1] 24583
> >
> > Another way could be generating one sample at a time and sum the cases.
> > But this would end up in trail & error to fit the 10.000 cases. Maybe
> > it would break rules of probability, too.
> >
> > I'm convinced that there should be another (and even better) way to
> > handle this problem in R... :-)
> >
> >
> > Best regards,
> > Arne Schulz
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list