[R] Computing an ordering on subsets of a data frame
Steven McKinney
smckinney at bccrc.ca
Fri Apr 20 04:54:05 CEST 2007
Hi Lukas,
Using by() or its cousins tapply() etc. is tricky,
as you need to properly merge results back into X.
You can do that by adding a key ID variable to X,
and carrying along that key ID variable in calls
to by() etc., though I haven't tested out a method.
You can also create a new column in X to hold the
results, and then sort the subsections of X in a
for() loop.
> X <- data.frame(A = c(1,1,1,2,2,2,3,3,3), B = c(2,3,4,3,1,1,2,1,3))
> X
A B
1 1 2
2 1 3
3 1 4
4 2 3
5 2 1
6 2 1
7 3 2
8 3 1
9 3 3
>
> X$C <- rep(as.numeric(NA), nrow(X))
>
> sortLevels <- unique(X$A)
>
> for(i in seq(along = sortLevels)) {
+ sortIdxp <- X$A == sortLevels[i]
+ X$C[sortIdxp] <- rank(X$B[sortIdxp], ties.method = "random")
+ }
> X
A B C
1 1 2 1
2 1 3 2
3 1 4 3
4 2 3 3
5 2 1 1
6 2 1 2
7 3 2 2
8 3 1 1
9 3 3 3
>
Merging results back in after using
tapply() or by() is harder if your
data frame is in random order, but the
for() loop approach with indexing
still works fine.
> set.seed(123)
> Y <- X[sample(9), ]
> Y
A B C
3 1 4 3
7 3 2 2
9 3 3 3
6 2 1 2
5 2 1 1
1 1 2 1
2 1 3 2
8 3 1 1
4 2 3 3
> Y$C <- rep(as.numeric(NA), nrow(Y))
>
> sortLevels <- unique(Y$A)
## You can also use levels() instead of unique() if Y$A is a factor.
>
> for(i in seq(along = sortLevels)) {
+ sortIdxp <- Y$A == sortLevels[i]
+ Y$C[sortIdxp] <- rank(Y$B[sortIdxp], ties.method = "random")
+ }
> Y
A B C
3 1 4 3
7 3 2 2
9 3 3 3
6 2 1 2
5 2 1 1
1 1 2 1
2 1 3 2
8 3 1 1
4 2 3 3
> oY <- order(Y$A)
> Y[oY,]
A B C
3 1 4 3
1 1 2 1
2 1 3 2
6 2 1 2
5 2 1 1
4 2 3 3
7 3 2 2
9 3 3 3
8 3 1 1
>
HTH
Steven McKinney
Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre
email: smckinney at bccrc.ca
tel: 604-675-8000 x7561
BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C.
V5Z 1L3
Canada
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-
> bounces at stat.math.ethz.ch] On Behalf Of Lukas Biewald
> Sent: Wednesday, April 18, 2007 2:49 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Computing an ordering on subsets of a data frame
>
> If I have a data frame X that looks like this:
>
> A B
> - -
> 1 2
> 1 3
> 1 4
> 2 3
> 2 1
> 2 1
> 3 2
> 3 1
> 3 3
>
> and I want to make another column which has the rank of B computed
> separately for each value of A.
>
> I.e. something like:
>
> A B C
> - - -
> 1 2 1
> 1 3 2
> 1 4 3
> 2 3 3
> 2 1 1
> 2 1 2
> 3 2 2
> 3 1 1
> 3 3 3
>
> by(X, X[,1], function(x) { rank(x[,1], ties.method="random") } )
almost
> seems to work, but the data is not in a frame, and I can't figure out
how
> to
> merge it back into X properly.
>
> Thanks,
> Lukas
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list