[R] Binning question (binning rows of a data.frame according to a variable)
Gabor Grothendieck
ggrothendieck at gmail.com
Fri Mar 17 20:03:15 CET 2006
If you are just looking for something simple that may be good enough
then assign the largest one to group 1, the second largest to group 2,
..., the 8th largest to group 8 and then start over again with group 1
and so on.
# test data
set.seed(1)
x <- sample(100, 100, rep = TRUE)
xs <- sort(x)
g <- gl(8, 1, length(xs)) # 8 groups
# so that g contains the groups that correspond to xs.
tapply(xs, g, sum) # 659 671 687 701 612 622 629 646
On 3/17/06, Dan Bolser <dmb at mrc-dunn.cam.ac.uk> wrote:
> Dan Bolser wrote:
> > Hi,
> >
> > I have tuples of data in rows of a data.frame, each column is a variable
> > for the 'items' (one per row).
> >
> > One of the variables is the 'size' of the item (row).
> >
> > I would like to cut my data.frame into groups such that each group has
> > the same *total size*. So, assuming that we order by size, some groups
> > should have several small items while other groups have a few large
> > items. All the groups should have approximately the same total size.
> >
> > I have tried various combinations of cut, quantile, and ecdf, and I just
> > can't work out how to do this!
> >
> > Any help is greatly appreciated!
> >
> > All the best,
> > Dan.
> >
>
> Perhaps there is a cleaver way, but I just wrote this in despiration...
>
>
> my.groups <- 8
>
> my.total <-
> sum(my.res.1$TOT) ## The 'size' variable in my data.frame
>
> my.approx.size <-
> my.total/
> my.groups
>
> my.j <- 1
> my.roll <- 0
> my.factor <- numeric()
>
> for(i in sort(my.res.1$TOT)){
>
> my.roll <-
> my.roll + i
>
> if (my.roll > my.approx.size * my.j)
> my.j <- my.j + 1
>
> my.factor <-
> append(my.factor,my.j)
> }
>
> my.factor <-
> as.factor(my.factor)
>
>
>
> Then...
>
> > tapply(my.factor,my.factor,length)
> 1 2 3 4 5 6 7 8
> 152 62 45 34 25 21 14 8
>
>
> And...
>
> > tapply(sort(my.res.1$TOT),my.factor,sum)
> 1 2 3 4 5 6 7 8
> 2880 2848 2912 2893 2832 2906 2776 3029
> >
>
>
>
> Which isn't bad.
>
>
>
>
>
>
>
>
>
>
>
>
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
More information about the R-help
mailing list