[R] get maximum 3 rows by column elements in data frame

jim holtman jholtman at gmail.com
Mon Nov 9 14:34:18 CET 2015


It is not entirely clear what you are asking for.  Can you provide a sample
of the output that you want from the data.  Here is the data split by
Measure_id, but not sure what to do with it:

> split(x, x$Measure_id)
$`1`
   Measure_id i j value      rank
1           1 2 1   1.5 0.7500000
6           1 2 3   2.0 1.0000000
11          1 2 4   1.0 0.5000000
16          1 4 3   2.0 1.0000000
21          1 5 1   2.0 1.0000000
26          1 5 2   1.5 0.7500000
31          1 7 3   1.5 1.0000000
36          1 7 5   1.0 0.6666667

$`2`
   Measure_id i j value rank
2           2 2 1   2.0 1.00
7           2 2 3   1.5 0.75
12          2 2 4   2.0 1.00
17          2 4 3   0.5 1.00
22          2 5 1   1.5 0.60
27          2 5 2   2.5 1.00
32          2 7 3   1.0 0.50
37          2 7 5   2.0 1.00

$`3`
   Measure_id i j value rank
3           3 2 1     1    1
8           3 2 3     0    0
13          3 2 4     0    0
18          3 4 3     1    1
23          3 5 1     0  NaN
28          3 5 2     0  NaN
33          3 7 3     0  NaN
38          3 7 5     0  NaN

$`4`
   Measure_id i j value rank
4           4 2 1     0    0
9           4 2 3     0    0
14          4 2 4     1    1
19          4 4 3     0  NaN
24          4 5 1     0    0
29          4 5 2     1    1
34          4 7 3     0  NaN
39          4 7 5     0  NaN

$`5`
   Measure_id i j value rank
5           5 2 1     2  1.0
10          5 2 3     1  0.5
15          5 2 4     2  1.0
20          5 4 3     2  1.0
25          5 5 1     1  0.5
30          5 5 2     2  1.0
35          5 7 3     1  1.0
40          5 7 5     1  1.0

>



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Nov 9, 2015 at 5:55 AM, Ragia Ibrahim <ragia11 at hotmail.com> wrote:

> Dear group,
>
> I have the following data freame
>
> dput(df_all_nodes)
>
> structure(list(Measure_id = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1,
> 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2,
> 3, 4, 5, 1, 2, 3, 4, 5), i = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
> 2, 2, 2, 2, 2, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 7,
> 7, 7, 7, 7, 7, 7, 7, 7, 7), j = c(1, 1, 1, 1, 1, 3, 3, 3, 3,
> 3, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
> 3, 3, 3, 3, 3, 5, 5, 5, 5, 5), value = c(1.5, 2, 1, 0, 2, 2,
> 1.5, 0, 0, 1, 1, 2, 0, 1, 2, 2, 0.5, 1, 0, 2, 2, 1.5, 0, 0, 1,
> 1.5, 2.5, 0, 1, 2, 1.5, 1, 0, 0, 1, 1, 2, 0, 0, 1), rank = c(0.75,
> 1, 1, 0, 1, 1, 0.75, 0, 0, 0.5, 0.5, 1, 0, 1, 1, 1, 1, 1, NaN,
> 1, 1, 0.6, NaN, 0, 0.5, 0.75, 1, NaN, 1, 1, 1, 0.5, NaN, NaN,
> 1, 0.666666666666667, 1, NaN, NaN, 1)), .Names = c("Measure_id",
> "i", "j", "value", "rank"), row.names = c(NA, 40L), class = "data.frame")
> >
>
> I want to get maximum 3 rows in each group of Measure_id. e.g. for
> measure_id 1 get the max ranks  (select the max for each measure depending
> on the rank column).
>
> how to do that
> Best regards,
> Ragia
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list