[R] [External] Weird behaviour of order() when having multiple ties
Avi Gross
@v|gro@@ @end|ng |rom ver|zon@net
Mon Jan 31 23:38:21 CET 2022
Tim,
My comment were not directed at you but at any conception that R should honor any conditions it does not pledge to honor. You pointed out a clear example that it does not get identical results when you use two things that both look like sorting algorithms. Yes, Stefan asked the original question. You provided some additional ideas and observations.
As I pointed out, there are many sorting algorithms commonly in use and they generally do not care about order in the sense expected by some when multiple items match. Neither do most algorithms of the sort. We discussed earlier the possibility of identifying if a data item was either all-numeric or all alphabetic or OTHER as in having both or neither or perhaps other characters not wanted like a comma. If the algorithm were to return a list of three lists, would you necessarily expect them in the right order of the way they were encountered, or sorted forward (or in reverse) alphabetically or using the current locale or numerically, or even randomly?
I can well imagine a parallel algorithm that hands off one or a subset of data to a thread and also other subsets to other threads and then monitors some kind of communication where the threads send reply messages about one item at a time. The results may be interleaved many ways. What the result would be should be treated as an unordered thread and if you want it ordered, do it yourself after you get it.
I took a look at the R code for order() and the manual page. I note you can call it with "method=" followed by one of "radix", "shell" and "quick" but it is a bit complex to read through and the major work is done in an internal routine probably written in C or something. But this is written about it:
"The sort used is stable (except for method = "quick"), so any unresolved ties will be left in their original ordering."
That suggests that the default and some other cases will return it in the order specified when there is a tie but the method called "quick" may have a different order sometimes. However, my version did not allow me to use quick! If you look at the manual page for sort() it too allows you to specify a method but note amusingly that for the default method it calls order() for part of the job for some kinds of objects.
I made a suggestion to anyone wanting a result and simplify it here. Instead of calling order(Dat1) alone, add a second argument like order(Dat1, second) that can be 1:N where N is the length of Dat1, or the reverse, or anything you want to use that will break a tie. Here is some code showing that:
# Initialize sample data where .1 is duplicated.
> Dat1 <- c(0.6, 0.5, 0.3, 0.2, 0.1, 0.1, 0.2)
# Default output from order:
> order(Dat1)
[1] 5 6 4 7 3 2 1
# Call order with a second vector of c(1,2,3,4,5,6,7)
> order(Dat1, seq_along(Dat1))
[1] 5 6 4 7 3 2 1
# Call order with a second vector of c(7,6,5,4,3,2,1) meaning reversed
> order(Dat1, length(Dat1):1)
[1] 6 5 7 4 3 2 1
I suspect one of these gives the requester the control to get what they want.
Again, I end by saying no insult intended. Some things in computer science can provide guarantees of working exactly a certain way and others do not. But often you can find ways, including some more complex and annoying ones. I am used to doing many things in the tidyverse where I would use the arrange() verb on a data.frame naturally to do a sort on multiple columns BUT this forum has people who discourage the tidyverse.
library(tidyverse)
> arrange(data.frame("először"=Dat1, "másodikat"=7:1))
eloször másodikat
1 0.6 7
2 0.5 6
3 0.3 5
4 0.2 4
5 0.1 3
6 0.1 2
7 0.2 1
> arrange(data.frame("először"=Dat1, "másodikat"=7:1))$`először`
[1] 0.6 0.5 0.3 0.2 0.1 0.1 0.2
> arrange(data.frame("először"=Dat1, "másodikat"=7:1))[1]
eloször
1 0.6
2 0.5
3 0.3
4 0.2
5 0.1
6 0.1
7 0.2
> arrange(data.frame("először"=Dat1, "másodikat"=7:1))[[1]]
[1] 0.6 0.5 0.3 0.2 0.1 0.1 0.2
Of course for this simple a need, definitely overkill and I would stick with base R. LOL!
-----Original Message-----
From: Ebert,Timothy Aaron <tebert using ufl.edu>
To: Avi Gross <avigross using verizon.net>; maechler using stat.math.ethz.ch <maechler using stat.math.ethz.ch>; stefan.b.fleck using gmail.com <stefan.b.fleck using gmail.com>
Cc: r-help using r-project.org <r-help using r-project.org>
Sent: Mon, Jan 31, 2022 3:36 pm
Subject: RE: [R] [External] Weird behaviour of order() when having multiple ties
Dear Avi,
I made no comment or question or statement about EQUAL items. That was elsewhere in posts by you and others in this thread.
My intent was only to show a simple example of Stefan’s post wherein the outcome of sort() and order() are different. Not why, or how, just they are different. If you type order() when you mean sort() things will not work as expected, as shown below.
Yes both right due to communicative law of multiplication. I don’t see the point.
I am suggesting nothing! I am simply observing the behavior of a function. If it satisfies my need then great. If not, I need to write my own or find a different function. It does help if I clearly understand the output of the function, and sometimes the documentation is not as helpful as hoped for given the range of readers from novice to expert. Here is the data in a different format where order=1 means it is the first observation in the data.
Dat1
Order
1
2
3
4
5
6
7
Data
0.6
0.5
0.3
0.2
0.1
0.1
0.2
print(order(Dat1)) returns [1] 5 6 4 7 3 2 1
So I sort the raw data by “Data” so that the values of order remain with each observed data point.
Original
Order
5
6
4
7
3
2
1
Data
0.1
0.1
0.2
0.2
0.3
0.5
0.6
Now reading off the values in row named “Order” I get the result of print(order(Dat1)).
Order does not return the sorted data, it returns the location of the sorted value in the original dataset. At least that is what it looks like. I assume that this is what the documentation means by “ ’order’ returns a permutation which rearranges its first argument into ascending or descending order” but I am afraid that I still do not get that connection from the text provided:https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/order.
As far as I can tell there is no error or inconsistency.
I am not quite skilled enough in R to take DatSort <- order(Dat1) and then return the sorted data, but I need to move to other tasks.
I am really sorry that my post makes you mad.
Regards,
Tim
From: Avi Gross <avigross using verizon.net>
Sent: Monday, January 31, 2022 12:33 PM
To: Ebert,Timothy Aaron <tebert using ufl.edu>; maechler using stat.math.ethz.ch; stefan.b.fleck using gmail.com
Cc: r-help using r-project.org
Subject: Re: [R] [External] Weird behaviour of order() when having multiple ties
[External Email]
Tim,
I thought I saw someone tell you that the order in which EQUAL items are presented is not deterministic in that any order, whether given by order() or sort() or anything else, is VALID.
Unless you supply additional constraints such as a second key to sort by, then the order becomes deterministic up to the point where both the keys are the same.
Here is a dumb suggestion. Place your Dat1 vector in a data.frame alongside another vector of 1:length(Dat1) and use some method that orders by Dat1 and then by the second vector, ascending. You have now forced it to take the first of a matching set before any others.
Let me try another. Say I give you a problem that might have multiple answer such as a quadratic equation with solutions of 2 and 10. You ask me for AN answer and I say 10. Am I wrong? You ask me for all answers and I say [10,2] and someone else says [2,10] and you wonder which of us is right. Well we are both right. The proper way to test is not to ask if the lists or tuples or anything ordered is equivalent but to use something like a set and show that they are equivalent or something like one is a subset of the other both ways.
Back to your topic, you are suggesting two independent developers should come up with algorithms to solve similar but different tasks the same way. Do you have any idea how many methods there are for sorting things? This site lists eleven and I am sure there are many more.
https://www.javatpoint.com/sorting-algorithms
What is considered more important is choosing an algorithm that works well on the kinds of data and some of those methods do not keep the data in the same order and produce results in the same order.
What you are pointing out is not an error but an inconsistency. The message to you is to not depend on a UNIQUE solution.
-----Original Message-----
From: Ebert,Timothy Aaron <tebert using ufl.edu>
To: Martin Maechler <maechler using stat.math.ethz.ch>; Stefan Fleck <stefan.b.fleck using gmail.com>
Cc: r-help using r-project.org <r-help using r-project.org>
Sent: Mon, Jan 31, 2022 10:07 am
Subject: Re: [R] [External] Weird behaviour of order() when having multiple ties
Dat1 <- c(0.6, 0.5, 0.3, 0.2, 0.1, 0.1, 0.2)
print(order(Dat1))
print(sort(Dat1))
Compare output
-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Martin Maechler
Sent: Monday, January 31, 2022 9:04 AM
To: Stefan Fleck <stefan.b.fleck using gmail.com>
Cc: r-help using r-project.org
Subject: Re: [R] [External] Weird behaviour of order() when having multiple ties
[External Email]
>>>>> Stefan Fleck
>>>>> on Sun, 30 Jan 2022 21:07:19 +0100 writes:
> it's not about the sort order of the ties, shouldn't all the 1s in
> order(c(2,3,4,1,1,1,1,1)) come before 2,3,4? because that's not what
> happening
aaah.. now we are getting somewhere:
It looks you have always confused order() with sort() ...
have you ?
> On Sun, Jan 30, 2022 at 9:00 PM Richard M. Heiberger <rmh using temple.edu> wrote:
>> when there are ties it doesn't matter which is first.
>> in a situation where it does matter, you will need a tiebreaker column.
>> ------------------------------
>> *From:* R-help <r-help-bounces using r-project.org> on behalf of Stefan Fleck <
>> stefan.b.fleck using gmail.com>
>> *Sent:* Sunday, January 30, 2022 4:16:44 AM
>> *To:* r-help using r-project.org <r-help using r-project.org>
>> *Subject:* [External] [R] Weird behaviour of order() when having multiple
>> ties
>>
>> I am experiencing a weird behavior of `order()` for numeric vectors. I
>> tested on 3.6.2 and 4.1.2 for windows and R 4.0.2 on ubuntu. Can anyone
>> confirm?
>>
>> order(
>> c(
>> 0.6,
>> 0.5,
>> 0.3,
>> 0.2,
>> 0.1,
>> 0.1
>> )
>> )
>> ## Result [should be in order]
>> [1] 5 6 4 3 2 1
>>
>> The sort order is obviously wrong. This only occurs if i have multiple
>> ties. The problem does _not_ occur for decreasing = TRUE.
>
More information about the R-help
mailing list