[R] Computing an ordering on subsets of a data frame

Fri Apr 20 04:54:05 CEST 2007

Hi Lukas,

Using by() or its cousins tapply() etc. is tricky,
as you need to properly merge results back into X.

You can do that by adding a key ID variable to X, 
and carrying along that key ID variable in calls
to by() etc., though I haven't tested out a method.

You can also create a new column in X to hold the
results, and then sort the subsections of X in a
for() loop.

> X <- data.frame(A = c(1,1,1,2,2,2,3,3,3), B = c(2,3,4,3,1,1,2,1,3))
> X
  A B
1 1 2
2 1 3
3 1 4
4 2 3
5 2 1
6 2 1
7 3 2
8 3 1
9 3 3
> 
> X$C <- rep(as.numeric(NA), nrow(X))
> 
> sortLevels <- unique(X$A)
> 
> for(i in seq(along = sortLevels)) {
+   sortIdxp <- X$A == sortLevels[i]
+   X$C[sortIdxp] <- rank(X$B[sortIdxp], ties.method = "random")
+ }
> X
  A B C
1 1 2 1
2 1 3 2
3 1 4 3
4 2 3 3
5 2 1 1
6 2 1 2
7 3 2 2
8 3 1 1
9 3 3 3
> 

Merging results back in after using
tapply() or by() is harder if your
data frame is in random order, but the
for() loop approach with indexing
still works fine.

> set.seed(123)
> Y <- X[sample(9), ]
> Y
  A B C
3 1 4 3
7 3 2 2
9 3 3 3
6 2 1 2
5 2 1 1
1 1 2 1
2 1 3 2
8 3 1 1
4 2 3 3
> Y$C <- rep(as.numeric(NA), nrow(Y))
> 
> sortLevels <- unique(Y$A)
## You can also use levels() instead of unique() if Y$A is a factor.
> 
> for(i in seq(along = sortLevels)) {
+   sortIdxp <- Y$A == sortLevels[i]
+   Y$C[sortIdxp] <- rank(Y$B[sortIdxp], ties.method = "random")
+ }
> Y
  A B C
3 1 4 3
7 3 2 2
9 3 3 3
6 2 1 2
5 2 1 1
1 1 2 1
2 1 3 2
8 3 1 1
4 2 3 3
> oY <- order(Y$A)
> Y[oY,]
  A B C
3 1 4 3
1 1 2 1
2 1 3 2
6 2 1 2
5 2 1 1
4 2 3 3
7 3 2 2
9 3 3 3
8 3 1 1
>

HTH

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney at bccrc.ca
tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3

Canada

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-
> bounces at stat.math.ethz.ch] On Behalf Of Lukas Biewald
> Sent: Wednesday, April 18, 2007 2:49 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Computing an ordering on subsets of a data frame
> 
> If I have a data frame X that looks like this:
> 
> A B
> - -
> 1 2
> 1 3
> 1 4
> 2 3
> 2 1
> 2 1
> 3 2
> 3 1
> 3 3
> 
> and I want to make another column which has the rank of B computed
> separately for each value of A.
> 
> I.e. something like:
> 
> A B C
> - - -
> 1 2 1
> 1 3 2
> 1 4 3
> 2 3 3
> 2 1 1
> 2 1 2
> 3 2 2
> 3 1 1
> 3 3 3
> 
> by(X, X[,1], function(x) { rank(x[,1], ties.method="random") } )
almost
> seems to work, but the data is not in a frame, and I can't figure out
how
> to
> merge it back into X properly.
> 
> Thanks,
> Lukas
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.