[R] How to delete duplicate cases?
Marc Schwartz
marc_schwartz at comcast.net
Thu Jul 24 16:34:03 CEST 2008
on 07/24/2008 09:00 AM Daniel Wagner wrote:
> Dear R users,
>
> I have a dataframe with lot of duplicate cases and I want to delete duplicate ones which have low rank and keep that case which has highest rank.
> e.g
>
>> df1
> cno rank
> 1 1342 0.23
> 2 1342 0.14
> 3 1342 0.56
> 4 2568 0.15
> 5 2568 0.89
>
> so I want to keep 3rd and 5th cases with highest rank (0.56 & 0.89) and delete rest of the duplicate cases.
> Could somebody help me?
>
> Regards
>
> Daniel
> Amsterdam
For the simple two column case, see ?aggregate:
> aggregate(dfl$rank, list(cno = dfl$cno), max)
cno x
1 1342 0.56
2 2568 0.89
A more generic approach might be:
> do.call(rbind, lapply(split(dfl, dfl$cno),
function(x) x[which.max(x$rank), ]))
cno rank
1342 1342 0.56
2568 2568 0.89
For example, using the iris dataset, get the rows, by Species, with the
highest Sepal.Length:
> do.call(rbind, lapply(split(iris, iris$Species),
function(x) x[which.max(x$Sepal.Length), ]))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
setosa 5.8 4.0 1.2 0.2 setosa
versicolor 7.0 3.2 4.7 1.4 versicolor
virginica 7.9 3.8 6.4 2.0 virginica
HTH,
Marc Schwartz
More information about the R-help
mailing list