[R] by-group processing
Max Webber
ubbermax at gmail.com
Thu May 7 00:09:12 CEST 2009
Given a dataframe like
> data
ID Type N
1 45900 A 1
2 45900 B 2
3 45900 C 3
4 45900 D 4
5 45900 E 5
6 45900 F 6
7 45900 I 7
8 49270 A 1
9 49270 B 2
10 49270 E 3
18 46550 A 1
19 46550 B 2
20 46550 C 3
21 46550 D 4
22 46550 E 5
23 46550 F 6
24 46550 I 7
>
containing an identifier (ID), a variable type code (Type), and
a running count of the number of records per ID (N), how can I
return a dataframe of only those records with the maximum value
of N for each ID? For instance,
> data
ID Type N
7 45900 I 7
10 49270 E 3
24 46550 I 7
I know that I can use
> tapply ( data $ N , data $ ID , max )
45900 46550 49270
7 7 3
>
to get the values of the maximum N for each ID, but how is it
that I can find the index of these values to subsequently use to
subscript data?
--
maxine-webber
More information about the R-help
mailing list