[R] difference between unique() and !duplicated()
Duncan Murdoch
murdoch at stats.uwo.ca
Thu Sep 13 13:11:05 CEST 2007
On 13/09/2007 5:47 AM, T.Lok wrote:
> Yesterday I spend the whole day struggling on how to get
> the maximum value of "y" for every unique value of "x"
> from the dataframe "test". In the R Book (Crawley, 2007)
> an example of this can be found on page 121. I tried to do
> it this way, but I failed.
>
> In the end, I figured out how to get it working (first
> order, and afterwards use !duplicated()). My question is:
> why does it not work with the unique() function on p. 121
> (
> i.e. test[rev(order(x)),][unique(y),]) ?
>
> As a simple example, I used to following syntax:
>
>> x <- c("A","A","B","B","C","C","D")
>> y <- c(1,2,1,1,2,3,1)
>> z <- c("yes","yes","no","yes","no","no","no")
>> test <- data.frame(x,y,z)
>> test
>
> x y z
> 1 A 1 yes
> 2 A 2 yes
> 3 B 1 no
> 4 B 1 yes
> 5 C 2 no
> 6 C 3 no
> 7 D 1 no
>
>> test[rev(order(test$y, test$z)),][unique(test$x),]
>
> x y z
> 6 C 3 no
> 2 A 2 yes
> 5 C 2 no
> 4 B 1 yes
>
> # this clearly does not give a unique value for x, since
> there are 2 C's and no D!
You are trying to index by the unique values of x. But x is a factor,
so this doesn't do anything even close to what you wanted.
>
>> test[rev(order(test$y, test$z)),][!duplicated(test$x),]
>
> x y z
> 6 C 3 no
> 5 C 2 no
> 1 A 1 yes
> 3 B 1 no
>
You rearranged the rows of test but not of test$x. This would work:
test <- test[rev(order(test$y, test$z)),]
test[!duplicated(test$x),]
> # this also doesn't work
> # then I thought, maybe first use the order() function,
> then unique()
>
>> test[rev(order(test$y, test$z)),]
>
> x y z
> 6 C 3 no
> 2 A 2 yes
> 5 C 2 no
> 4 B 1 yes
> 1 A 1 yes
> 7 D 1 no
> 3 B 1 no
>
>> test1 <- test[rev(order(test$y, test$z)),]
>> test1[unique(test1$x),]
>
> x y z
> 5 C 2 no
> 6 C 3 no
> 2 A 2 yes
> 4 B 1 yes
>
> # still no unique values for x
>
>> test1[!duplicated(test1$x),]
>
> x y z
> 6 C 3 no
> 2 A 2 yes
> 4 B 1 yes
> 7 D 1 no
>
> # finally I get unique values for x, for the maximum value
> of y (and z). But why does this not work when giving the
> order() and !duplicated() command simultaneously?
> And why does only !duplicated() work, and not unique()?
I think both questions are answered above.
Duncan Murdoch
More information about the R-help
mailing list