[R] by-group processing
David Freedman
3.14david at gmail.com
Fri May 8 01:44:32 CEST 2009
how about:
d=data[order(data$ID,-data$Type),]
d[!duplicated(d$ID),]
Max Webber wrote:
>
> Given a dataframe like
>
> > data
> ID Type N
> 1 45900 A 1
> 2 45900 B 2
> 3 45900 C 3
> 4 45900 D 4
> 5 45900 E 5
> 6 45900 F 6
> 7 45900 I 7
> 8 49270 A 1
> 9 49270 B 2
> 10 49270 E 3
> 18 46550 A 1
> 19 46550 B 2
> 20 46550 C 3
> 21 46550 D 4
> 22 46550 E 5
> 23 46550 F 6
> 24 46550 I 7
> >
>
> containing an identifier (ID), a variable type code (Type), and
> a running count of the number of records per ID (N), how can I
> return a dataframe of only those records with the maximum value
> of N for each ID? For instance,
>
> > data
> ID Type N
> 7 45900 I 7
> 10 49270 E 3
> 24 46550 I 7
>
> I know that I can use
>
> > tapply ( data $ N , data $ ID , max )
> 45900 46550 49270
> 7 7 3
> >
>
> to get the values of the maximum N for each ID, but how is it
> that I can find the index of these values to subsequently use to
> subscript data?
>
>
> --
> maxine-webber
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://www.nabble.com/by-group-processing-tp23417208p23437592.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list