[R] by-group processing

William Dunlap wdunlap at tibco.com
Thu May 7 21:42:24 CEST 2009


Max,

Since the dataset is sorted by ID, with ties broken by N, the following
should do it and do it quickly.  It grabs the rows just before ID
changes.

> with(data, data[ c(ID[-1] != ID[-length(ID)], TRUE),, drop=FALSE])
      ID Type N
7  45900    I 7
10 49270    E 3
24 46550    I 7

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Max Webber
> Sent: Wednesday, May 06, 2009 3:09 PM
> To: r-help at r-project.org
> Subject: [R] by-group processing
> 
> Given a dataframe like
> 
>   > data
>         ID Type N
>   1  45900    A 1
>   2  45900    B 2
>   3  45900    C 3
>   4  45900    D 4
>   5  45900    E 5
>   6  45900    F 6
>   7  45900    I 7
>   8  49270    A 1
>   9  49270    B 2
>   10 49270    E 3
>   18 46550    A 1
>   19 46550    B 2
>   20 46550    C 3
>   21 46550    D 4
>   22 46550    E 5
>   23 46550    F 6
>   24 46550    I 7
>   >
> 
> containing an identifier (ID), a variable type code (Type), and
> a running count of the number of records per ID (N), how can I
> return a dataframe of only those records with the maximum value
> of N for each ID? For instance,
> 
>   > data
>         ID Type N
>   7  45900    I 7
>   10 49270    E 3
>   24 46550    I 7
> 
> I know that I can use
> 
>    > tapply ( data $ N , data $ ID , max )
>    45900 46550 49270
>        7     7     3
>    >
> 
> to get the values of the maximum N for each ID, but how is it
> that I can find the index of these values to subsequently use to
> subscript data?
> 
> 
> --
> maxine-webber
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 




More information about the R-help mailing list