[R] How to delete a duplicate observation
jim holtman
jholtman at gmail.com
Thu Sep 13 20:41:01 CEST 2007
Here a way of doing it:
> x <- cbind(V1=sample(1:3,20,TRUE), V2=sample(1:3,20,TRUE), V3=sample(20))
> x
V1 V2 V3
[1,] 2 2 1
[2,] 1 2 6
[3,] 3 2 10
[4,] 3 1 11
[5,] 3 2 5
[6,] 3 2 7
[7,] 2 1 19
[8,] 3 3 13
[9,] 1 3 2
[10,] 3 3 20
[11,] 3 3 18
[12,] 2 1 4
[13,] 3 2 3
[14,] 3 2 12
[15,] 3 1 17
[16,] 2 3 9
[17,] 2 3 8
[18,] 1 1 16
[19,] 3 2 15
[20,] 3 3 14
> x.max <- do.call('rbind', by(x, list(x[,1], x[,2]), function(.sub){
+ .sub[which.max(.sub[,3]),]
+ }))
> x.max
V1 V2 V3
18 1 1 16
7 2 1 19
15 3 1 17
2 1 2 6
5 2 2 1
19 3 2 15
9 1 3 2
16 2 3 9
10 3 3 20
>
On 9/13/07, Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:
> nuyaying wrote:
> > I have a data set with 3 variables V1, V2, V3. If there are 2 data points
> > have the same values on both V1 and V2, I want to delete one of them which
> > has smaller V3 value. i.e., in the data below, I want to delete
> > the first observation. How can I do that ? Thanks in advance!
> >
> > V1 V2 V3
> > 3 3 1
> > 3 3 4
> >
> >
> Tricky one... I think something like this should work:
>
> l <- split(d$V3, list(d$V1,d$V2))
> ixl <- lapply(l, function(x) {
> if ((n <- nrow(x)) == 2)
> seq_len(n) != which.min(x)
> else
> rep(TRUE, n)
> })
> ix <- unsplit(ixl, list(d$V1,d$V2))
> d[ix,]
>
> --
> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
More information about the R-help
mailing list