[R] Ranking within factor subgroups

Fri Feb 24 01:46:00 CET 2006

Hi Peter,

That did the trick. Thank you very much.

Regards,

Maneesh

>From: Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>To: "maneesh deshpande" <dmaneesh at hotmail.com>
>CC: ramasamy at cancer.org.uk, r-help at stat.math.ethz.ch
>Subject: Re: [R] Ranking within factor subgroups
>Date: 23 Feb 2006 07:28:13 +0100
>
>"maneesh deshpande" <dmaneesh at hotmail.com> writes:
>
> > Hi Adai,
> >
> > I think your solution only works if the rows of the data frame are 
>ordered
> > by "date" and
> > the ordering function is the same used to order the levels of
> > factor(df$date) ?
> > It turns out (as I implied in my question) my data is indeed organized 
>in
> > this manner, so my
> > current problem is solved.
> > In the general case, I suppose, one could always order the data frame by
> > date before proceeding ?
> >
> > Thanks,
> >
> > Maneesh
>
>You might prefer to look at split/unsplit/split<-, i.e. the z-scores
>by group line:
>
>      z <- unsplit(lapply(split(x, g), scale), g)
>
>with "scale" suitably replaced. Presumably (meaning: I didn't quite
>read your code closely enough)
>
>     z <- unsplit(lapply(split(x, g), bucket, 10), g)
>
>could do it.
>
> >
> > >From: Adaikalavan Ramasamy <ramasamy at cancer.org.uk>
> > >Reply-To: ramasamy at cancer.org.uk
> > >To: maneesh deshpande <dmaneesh at hotmail.com>
> > >CC: r-help at stat.math.ethz.ch
> > >Subject: Re: [R]  Ranking within factor subgroups
> > >Date: Wed, 22 Feb 2006 03:44:45 +0000
> > >
> > >It might help to give a simple reproducible example in the future. For
> > >example
> > >
> > >  df <- cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100),
> > >                          B=rpois(500, 50), C=rpois(500, 30) )
> > >
> > >might generate something like
> > >
> > >	    date   A  B  C
> > >	  1    1  93 51 32
> > >	  2    1  95 51 30
> > >	  3    1 102 59 28
> > >	  4    1 105 52 32
> > >	  5    1 105 53 26
> > >	  6    1  99 59 37
> > >	...    . ... .. ..
> > >	495    5 100 57 19
> > >	496    5  96 47 44
> > >	497    5 111 56 35
> > >	498    5 105 49 23
> > >	499    5 105 61 30
> > >	500    5  92 53 32
> > >
> > >Here is my proposed solution. Can you double check with your existing
> > >functions to see if they are correct.
> > >
> > >    decile.fn <- function(x, nbreaks=10){
> > >      br     <- quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T )
> > >      br[1]  <- -Inf
> > >      return( cut(x, br, labels=F) )
> > >    }
> > >
> > >    out <- apply( df[ ,c("A", "B", "C")], 2,
> > >                  function(v) unlist( tapply( v, df$date, decile.fn ) ) 
>)
> > >
> > >    rownames(out) <- rownames(df)
> > >    out <- cbind(df$date, out)
> > >
> > >Regards, Adai
> > >
> > >
> > >
> > >On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote:
> > > > Hi,
> > > >
> > > > I have a dataframe, x of the following form:
> > > >
> > > > Date            Symbol   A    B  C
> > > > 20041201     ABC      10  12 15
> > > > 20041201     DEF       9    5   4
> > > > ...
> > > > 20050101     ABC         5  3   1
> > > > 20050101     GHM       12 4    2
> > > > ....
> > > >
> > > > here A, B,C are properties of a set symbols recorded for a given 
>date.
> > > > I wante to decile the symbols For each date and property and
> > > > create another set of columns "bucketA","bucketB", "bucketC" 
>containing
> > >the
> > > > decile rank
> > > > for each symbol. The following non-vectorized code does what I want,
> > > >
> > > > bucket <- function(data,nBuckets) {
> > > >      q <- quantile(data,seq(0,1,len=nBuckets+1),na.rm=T)
> > > >      q[1] <- q[1] - 0.1 # need to do this to ensure there are no 
>extra
> > >NAs
> > > >      cut(data,q,include.lowest=T,labels=F)
> > > > }
> > > >
> > > > calcDeciles <- function(x,colNames) {
> > > > nBuckets <- 10
> > > > dates <- unique(x$Date)
> > > > for ( date in dates) {
> > > >   iVec <- x$Date == date
> > > >   xx <- x[iVec,]
> > > >   for (colName in colNames) {
> > > >      data <- xx[,colName]
> > > >      bColName <- paste("bucket",colName,sep="")
> > > >      x[iVec,bColName] <- bucket(data,nBuckets)
> > > >   }
> > > > }
> > > > x
> > > > }
> > > >
> > > > x <- calcDeciles(x,c("A","B","C"))
> > > >
> > > >
> > > > I was wondering if it is possible to vectorize the above function to
> > >make it
> > > > more efficient.
> > > > I tried,
> > > > rlist <- tapply(x$A,x$Date,bucket)
> > > > but I am not sure how to assign the contents of "rlist" to their
> > >appropriate
> > > > slots in the original
> > > > dataframe.
> > > >
> > > > Thanks,
> > > >
> > > > Maneesh
> > > >
> > > > ______________________________________________
> > > > R-help at stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide!
> > >http://www.R-project.org/posting-guide.html
> > > >
> > >
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
>http://www.R-project.org/posting-guide.html
> >
>
>--
>    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 
>35327918
>~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 
>35327907