[R] a more elegant way to get percentages?

Thu Mar 13 14:54:11 CET 2008

Monica,

You can try the following:

> x.tot <- aggregate(x$val, by=list(total=x$locat), 'sum')
> x.tot
  total  x
1     a  5
2     b 20
3     c 40
4     d 30
> cbind(x, perc=x$val/rep(x.tot$x, table(x$locat)) * 100)
   locat val      perc
1      a   5 100.00000
2      b   5  25.00000
3      b  15  75.00000
4      c   5  12.50000
5      c  20  50.00000
6      c   5  12.50000
7      c  10  25.00000
8      d   5  16.66667
9      d  15  50.00000
10     d  10  33.33333

-Christos

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Monica Pisica
> Sent: Thursday, March 13, 2008 9:36 AM
> To: r-help at r-project.org
> Subject: [R] a more elegant way to get percentages?
> 
> 
> Hi,
> 
> I am trying to get percentages in a more elegant way. I have 
> a data.frame with locations and values (counts) of species at 
> that location. Each location is repeated for each species i 
> have values for and i would like to get percentages of each 
> species at that location. I am not sure if i am clear in my 
> explanations so i will paste my code below:
> 
> #####################
> 
> > x
>    locat val
> 1      a   5
> 2      b   5
> 3      b  15
> 4      c   5
> 5      c  20
> 6      c   5
> 7      c  10
> 8      d   5
> 9      d  15
> 10     d  10
> > loc1 <- x$locat
> > n <- length(loc1)
> > locuniq1 <- unique(loc1)
> > m <- length(locuniq1)
> > counts <- seq(1:m)
> > 
> > for (i in 1:m) {
> + count <- 0
> + for (j in 1:n) {
> + if (loc1[j]==locuniq1[i]) count <- count+1 counts[i] <- count } }
> > 
> > percent1 <- rep(0,n)
> > j <- 0
> > for (i in 1:m) {
> + 
> + b <- x[(j+1):(j+counts[i]),]
> + total <- sum(b$val)
> + percent1[(j+1):(j+counts[i])] <- round(apply(as.matrix(b$val), 1, 
> + function(x) {x*100/total}),2) j = j+counts[i] }
> > x1 <- cbind(x, percent1)    # this is the result i want 
> > x1
>    locat val percent1
> 1      a   5   100.00
> 2      b   5    25.00
> 3      b  15    75.00
> 4      c   5    12.50
> 5      c  20    50.00
> 6      c   5    12.50
> 7      c  10    25.00
> 8      d   5    16.67
> 9      d  15    50.00
> 10     d  10    33.33
> > 
> ################
> 
> I am wondering if there is any way to do it more efficiently, 
> much more that the first loop which gives how many times each 
> location is present in the data.frame is slow if you have a 
> larger data.frame and not only 10 rows.
> 
> Thanks for any input and sorry if the email is on the long side,
> 
> Monica
> 
> 
> _________________________________________________________________
> [[elided Hotmail spam]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
>