Discussion:
[Haskell-cafe] natural mergesort in Data.List
Christian Sternagel
6 years ago
Permalink
Dear Cafe,

some years ago I formalized the mergesort implementation [1] from


http://hackage.haskell.org/package/base-4.11.1.0/docs/src/Data.OldList.html#sort

(as far as I can tell it didn't change since 2012) in the proof
assistant Isabelle/HOL [2].

More specifically, I proved its functional correctness (the result is
sorted and contains all elements of the input with exactly the same
multiplicities) and that it is a stable sorting algorithm.

Very recently I also formalized a complexity result in Isabelle/HOL,
namely that the number of comparisons is bounded from above by

n + n * ⌈log 2 n⌉

for lists of length n.

For this proof I had to change the definition of "sequences",
"ascending", and "descending" slightly.

Now here is my question: Does anyone now of reasons why the current
implementation of "sequences" is allowed to produce spurious empty lists
in its result? The version I used in my formalization only differs in
the following three spots:

sequences [x] = [[x]] -- this is the only important change, since
sequences [] = [] -- then the result does not contain empty lists

instead of

sequences xs = [xs]

and

ascending a as [] = let !x = as [a] in [x]

instead of

ascending a as bs = let !x = as [a] in x : sequences bs

and

descending a as [] = [a:as]

instead of

descending a as bs = (a:as) : sequences bs

[1] https://www.isa-afp.org/entries/Efficient-Mergesort.html
[2] http://isabelle.in.tum.de/
Joachim Breitner
6 years ago
Permalink
Hi,
Post by Christian Sternagel
some years ago I formalized the mergesort implementation [1] from
http://hackage.haskell.org/package/base-4.11.1.0/docs/src/Data.OldList.html#sort
(as far as I can tell it didn't change since 2012) in the proof
assistant Isabelle/HOL [2].
More specifically, I proved its functional correctness (the result is
sorted and contains all elements of the input with exactly the same
multiplicities) and that it is a stable sorting algorithm.
just a shameless plug: As part of our hs-to-coq project, we
mechanically translated Data.List.sort to Coq, proved its termination
(for finite inputs, of course) and functional correctness;
https://github.com/antalsz/hs-to-coq/blob/master/examples/containers/theories/SortSorted.v
Post by Christian Sternagel
Very recently I also formalized a complexity result in Isabelle/HOL,
namely that the number of comparisons is bounded from above by
n + n * ⌈log 2 n⌉
for lists of length n.
Cool! That’s of course not easily possible with out shallow embedding
into Coq.
Post by Christian Sternagel
For this proof I had to change the definition of "sequences",
"ascending", and "descending" slightly.
Now here is my question: Does anyone now of reasons why the current
implementation of "sequences" is allowed to produce spurious empty lists
in its result? The version I used in my formalization only differs in


It _could_ have been benchmarked and measured and determined that it is
actually faster to do less case analysis here. But I find that unlikely
(both the investigation and this output), and it is probably simply an
oversight, and I would expect that a patch to that effect would be
appreciated !

Cheers,
Joachim
--
Joachim Breitner
***@joachim-breitner.de
http://www.joachim-breitner.de/
David Feuer
6 years ago
Permalink
If you're guaranteeing that the result won't contain empty lists, it would
be worth benchmarking the effect of using NonEmpty x instead of [x] in that
spot.
...
Christian Sternagel
6 years ago
Permalink
Dear David,

I am guaranteeing (since I proved it in Isabelle/HOL) that the following
version of "sequences" does not contain empty lists in its result (I am
copying from my Isabelle formalization, in order to avoid typos) ;)

fun sequences :: "'a list ⇒ 'a list list"
and asc :: "'a ⇒ ('a list ⇒ 'a list) ⇒ 'a list ⇒ 'a list list"
and desc :: "'a ⇒ 'a list ⇒ 'a list ⇒ 'a list list"
where
"sequences (a # b # xs) =
(if key a > key b then desc b [a] xs else asc b ((#) a) xs)"
| "sequences [x] = [[x]]"
| "sequences [] = []"
| "asc a as (b # bs) =
(if ¬ key a > key b then asc b (λys. as (a # ys)) bs
else as [a] # sequences (b # bs))"
| "asc a as [] = [as [a]]"
| "desc a as (b # bs) =
(if key a > key b then desc b (a # as) bs
else (a # as) # sequences (b # bs))"
| "desc a as [] = [a # as]"

The "key" function is an implicit first parameter for each of
"sequences", "asc", and "desc" above. The fact that I am using a "key"
function instead of a comparator is due to Isabelle/HOL's standard
library. Also, there are no pattern guards in Isabelle/HOL. Anyway, it
should be relatively straight-forward to translate these functions into
Haskell.

Another thing: I just realized that "merge_pairs" in my formalization
also differs from "mergePairs", since with the changed "sequences" it
might actually get an empty list as input, in which case the current
"mergePairs" wouldn't terminate at all.

So for those who are interested, the full definition of mergesort can be
found here


https://devel.isa-afp.org/browser_info/current/AFP/Efficient-Mergesort/Efficient_Mergesort.html

where mergesort is called "msort_key".

cheers

chris

Btw: What is "NonEmpty x"?
...
David Feuer
6 years ago
Permalink
data NonEmpty a = a :| [a]

It's a nonempty list, defined in Data.List.NonEmpty.
...
Christian Sternagel
6 years ago
Permalink
Dear David,

thanks for the pointer.

Btw: I was able to modify my complexity proof so that "sequences" is no
longer required to only contain non-empty lists. Sorry for the noise.

But maybe such a "sequences" is not entirely useless: Did I understand
correctly that what you are saying is that knowing that all elements in
"sequences" are non-empty lists might have some impact on performance
due to being able to use "NonEmpty a"?

cheers

chris
...
David Feuer
6 years ago
Permalink
It may. Or it may not. The performance impact of such small changes can be
hard to predict.
...
Loading...