[R] difference between unique() and !duplicated()

Duncan Murdoch murdoch at stats.uwo.ca
Thu Sep 13 13:11:05 CEST 2007


On 13/09/2007 5:47 AM, T.Lok wrote:
> Yesterday I spend the whole day struggling on how to get 
> the maximum value of "y" for every unique value of "x" 
> from the dataframe "test". In the R Book (Crawley, 2007) 
> an example of this can be found on page 121. I tried to do 
> it this way, but I failed.
> 
> In the end, I figured out how to get it working (first 
> order, and afterwards use !duplicated()). My question is: 
> why does it not work with the unique() function on p. 121 
> (
> i.e. test[rev(order(x)),][unique(y),]) ?
> 
> As a simple example, I used to following syntax:
> 
>> x <- c("A","A","B","B","C","C","D")
>> y <- c(1,2,1,1,2,3,1)
>> z <- c("yes","yes","no","yes","no","no","no")
>> test <- data.frame(x,y,z)
>> test
> 
>    x y   z
> 1 A 1 yes
> 2 A 2 yes
> 3 B 1  no
> 4 B 1 yes
> 5 C 2  no
> 6 C 3  no
> 7 D 1  no
> 
>> test[rev(order(test$y, test$z)),][unique(test$x),]
> 
>    x y   z
> 6 C 3  no
> 2 A 2 yes
> 5 C 2  no
> 4 B 1 yes
> 
> # this clearly does not give a unique value for x, since 
> there are 2 C's and no D!

You are trying to index by the unique values of x.  But x is a factor, 
so this doesn't do anything even close to what you wanted.

> 
>> test[rev(order(test$y, test$z)),][!duplicated(test$x),]
> 
>    x y   z
> 6 C 3  no
> 5 C 2  no
> 1 A 1 yes
> 3 B 1  no
> 

You rearranged the rows of test but not of test$x.  This would work:

test <- test[rev(order(test$y, test$z)),]
test[!duplicated(test$x),]

> # this also doesn't work
> # then I thought, maybe first use the order() function, 
> then unique()
> 
>> test[rev(order(test$y, test$z)),]
> 
>    x y   z
> 6 C 3  no
> 2 A 2 yes
> 5 C 2  no
> 4 B 1 yes
> 1 A 1 yes
> 7 D 1  no
> 3 B 1  no
> 
>> test1 <- test[rev(order(test$y, test$z)),]
>> test1[unique(test1$x),]
> 
>    x y   z
> 5 C 2  no
> 6 C 3  no
> 2 A 2 yes
> 4 B 1 yes
> 
> # still no unique values for x
> 
>> test1[!duplicated(test1$x),]
> 
>    x y   z
> 6 C 3  no
> 2 A 2 yes
> 4 B 1 yes
> 7 D 1  no
> 
> # finally I get unique values for x, for the maximum value 
> of y (and z). But why does this not work when giving the 
> order() and !duplicated() command simultaneously?
> And why does only !duplicated() work, and not unique()?

I think both questions are answered above.

Duncan Murdoch



More information about the R-help mailing list