[Haskell-cafe] natural mergesort in Data.List

Discussion:

Christian Sternagel

6 years ago

Dear Cafe,

some years ago I formalized the mergesort implementation [1] from

http://hackage.haskell.org/package/base-4.11.1.0/docs/src/Data.OldList.html#sort

(as far as I can tell it didn't change since 2012) in the proof
assistant Isabelle/HOL [2].

More specifically, I proved its functional correctness (the result is
sorted and contains all elements of the input with exactly the same
multiplicities) and that it is a stable sorting algorithm.

Very recently I also formalized a complexity result in Isabelle/HOL,
namely that the number of comparisons is bounded from above by

n + n * ⌈log 2 n⌉

for lists of length n.

For this proof I had to change the definition of "sequences",
"ascending", and "descending" slightly.

Now here is my question: Does anyone now of reasons why the current
implementation of "sequences" is allowed to produce spurious empty lists
in its result? The version I used in my formalization only differs in
the following three spots:

sequences [x] = [[x]] -- this is the only important change, since
sequences [] = [] -- then the result does not contain empty lists

instead of

sequences xs = [xs]

and

ascending a as [] = let !x = as [a] in [x]

instead of

ascending a as bs = let !x = as [a] in x : sequences bs

and

descending a as [] = [a:as]

instead of

descending a as bs = (a:as) : sequences bs

[1] https://www.isa-afp.org/entries/Efficient-Mergesort.html
[2] http://isabelle.in.tum.de/

Joachim Breitner

6 years ago

Permalink

Hi,

Post by Christian Sternagel
some years ago I formalized the mergesort implementation [1] from
http://hackage.haskell.org/package/base-4.11.1.0/docs/src/Data.OldList.html#sort
(as far as I can tell it didn't change since 2012) in the proof
assistant Isabelle/HOL [2].
More specifically, I proved its functional correctness (the result is
sorted and contains all elements of the input with exactly the same
multiplicities) and that it is a stable sorting algorithm.

just a shameless plug: As part of our hs-to-coq project, we
mechanically translated Data.List.sort to Coq, proved its termination
(for finite inputs, of course) and functional correctness;
https://github.com/antalsz/hs-to-coq/blob/master/examples/containers/theories/SortSorted.v

Post by Christian Sternagel
Very recently I also formalized a complexity result in Isabelle/HOL,
namely that the number of comparisons is bounded from above by
n + n * âlog 2 nâ
for lists of length n.

Cool! Thatâs of course not easily possible with out shallow embedding
into Coq.

Post by Christian Sternagel
For this proof I had to change the definition of "sequences",
"ascending", and "descending" slightly.
Now here is my question: Does anyone now of reasons why the current
implementation of "sequences" is allowed to produce spurious empty lists
in its result? The version I used in my formalization only differs in
âŠ

It _could_ have been benchmarked and measured and determined that it is
actually faster to do less case analysis here. But I find that unlikely
(both the investigation and this output), and it is probably simply an
oversight, and I would expect that a patch to that effect would be
appreciated !

Cheers,
Joachim

--
Joachim Breitner
***@joachim-breitner.de
http://www.joachim-breitner.de/

David Feuer

6 years ago

Permalink

If you're guaranteeing that the result won't contain empty lists, it would
be worth benchmarking the effect of using NonEmpty x instead of [x] in that
spot.

...

Christian Sternagel

6 years ago

Permalink

Dear David,

I am guaranteeing (since I proved it in Isabelle/HOL) that the following
version of "sequences" does not contain empty lists in its result (I am
copying from my Isabelle formalization, in order to avoid typos) ;)

fun sequences :: "'a list ⇒ 'a list list"
and asc :: "'a ⇒ ('a list ⇒ 'a list) ⇒ 'a list ⇒ 'a list list"
and desc :: "'a ⇒ 'a list ⇒ 'a list ⇒ 'a list list"
where
"sequences (a # b # xs) =
(if key a > key b then desc b [a] xs else asc b ((#) a) xs)"
| "sequences [x] = [[x]]"
| "sequences [] = []"
| "asc a as (b # bs) =
(if ¬ key a > key b then asc b (λys. as (a # ys)) bs
else as [a] # sequences (b # bs))"
| "asc a as [] = [as [a]]"
| "desc a as (b # bs) =
(if key a > key b then desc b (a # as) bs
else (a # as) # sequences (b # bs))"
| "desc a as [] = [a # as]"

The "key" function is an implicit first parameter for each of
"sequences", "asc", and "desc" above. The fact that I am using a "key"
function instead of a comparator is due to Isabelle/HOL's standard
library. Also, there are no pattern guards in Isabelle/HOL. Anyway, it
should be relatively straight-forward to translate these functions into
Haskell.

Another thing: I just realized that "merge_pairs" in my formalization
also differs from "mergePairs", since with the changed "sequences" it
might actually get an empty list as input, in which case the current
"mergePairs" wouldn't terminate at all.

So for those who are interested, the full definition of mergesort can be
found here

https://devel.isa-afp.org/browser_info/current/AFP/Efficient-Mergesort/Efficient_Mergesort.html

where mergesort is called "msort_key".

cheers

chris

Btw: What is "NonEmpty x"?

...

David Feuer

6 years ago

Permalink

data NonEmpty a = a :| [a]

It's a nonempty list, defined in Data.List.NonEmpty.

...

Christian Sternagel

6 years ago

Permalink

Dear David,

thanks for the pointer.

Btw: I was able to modify my complexity proof so that "sequences" is no
longer required to only contain non-empty lists. Sorry for the noise.

But maybe such a "sequences" is not entirely useless: Did I understand
correctly that what you are saying is that knowing that all elements in
"sequences" are non-empty lists might have some impact on performance
due to being able to use "NonEmpty a"?

cheers

chris

...

David Feuer

6 years ago

Permalink

It may. Or it may not. The performance impact of such small changes can be
hard to predict.

Post by Christian Sternagel
Dear David,
thanks for the pointer.
Btw: I was able to modify my complexity proof so that "sequences" is no
longer required to only contain non-empty lists. Sorry for the noise.
But maybe such a "sequences" is not entirely useless: Did I understand
correctly that what you are saying is that knowing that all elements in
"sequences" are non-empty lists might have some impact on performance
due to being able to use "NonEmpty a"?
cheers
chris

Post by David Feuer
data NonEmpty a = a :| [a]
It's a nonempty list, defined in Data.List.NonEmpty.
Dear David,
I am guaranteeing (since I proved it in Isabelle/HOL) that the

following

Post by David Feuer
version of "sequences" does not contain empty lists in its result (I

Post by David Feuer
copying from my Isabelle formalization, in order to avoid typos) ;)
fun sequences :: "'a list â 'a list list"
and asc :: "'a â ('a list â 'a list) â 'a list â 'a list list"
and desc :: "'a â 'a list â 'a list â 'a list list"
where
"sequences (a # b # xs) =
(if key a > key b then desc b [a] xs else asc b ((#) a) xs)"
| "sequences [x] = [[x]]"
| "sequences [] = []"
| "asc a as (b # bs) =
(if Â¬ key a > key b then asc b (Î»ys. as (a # ys)) bs
else as [a] # sequences (b # bs))"
| "asc a as [] = [as [a]]"
| "desc a as (b # bs) =
(if key a > key b then desc b (a # as) bs
else (a # as) # sequences (b # bs))"
| "desc a as [] = [a # as]"
The "key" function is an implicit first parameter for each of
"sequences", "asc", and "desc" above. The fact that I am using a

"key"

Post by David Feuer
function instead of a comparator is due to Isabelle/HOL's standard
library. Also, there are no pattern guards in Isabelle/HOL. Anyway,

Post by David Feuer
should be relatively straight-forward to translate these functions

into

Post by David Feuer
Haskell.
Another thing: I just realized that "merge_pairs" in my formalization
also differs from "mergePairs", since with the changed "sequences" it
might actually get an empty list as input, in which case the current
"mergePairs" wouldn't terminate at all.
So for those who are interested, the full definition of mergesort

can be

Post by David Feuer
found here

https://devel.isa-afp.org/browser_info/current/AFP/Efficient-Mergesort/Efficient_Mergesort.html

Post by David Feuer
where mergesort is called "msort_key".
cheers
chris
Btw: What is "NonEmpty x"?

Post by David Feuer
If you're guaranteeing that the result won't contain empty lists,

Post by David Feuer

Post by David Feuer
would be worth benchmarking the effect of using NonEmpty x instead

Post by David Feuer

Post by David Feuer
[x] in that spot.
On Tue, Sep 18, 2018, 4:21 AM Christian Sternagel
Dear Cafe,
some years ago I formalized the mergesort implementation [1]

from
http://hackage.haskell.org/package/base-4.11.1.0/docs/src/Data.OldList.html#sort

Post by David Feuer

Post by David Feuer
(as far as I can tell it didn't change since 2012) in the proof
assistant Isabelle/HOL [2].
More specifically, I proved its functional correctness (the

result is

Post by David Feuer
sorted and contains all elements of the input with exactly the

same

Post by David Feuer
multiplicities) and that it is a stable sorting algorithm.
Very recently I also formalized a complexity result in

Isabelle/HOL,

Post by David Feuer
namely that the number of comparisons is bounded from above by
n + n * âlog 2 nâ
for lists of length n.
For this proof I had to change the definition of "sequences",
"ascending", and "descending" slightly.
Now here is my question: Does anyone now of reasons why the

current

Post by David Feuer
implementation of "sequences" is allowed to produce spurious

empty lists

Post by David Feuer
in its result? The version I used in my formalization only

differs in

Post by David Feuer
sequences [x] = [[x]] -- this is the only important change,

since

Post by David Feuer
sequences [] = [] -- then the result does not contain

empty lists

Post by David Feuer
instead of
sequences xs = [xs]
and
ascending a as [] = let !x = as [a] in [x]
instead of
ascending a as bs = let !x = as [a] in x : sequences bs
and
descending a as [] = [a:as]
instead of
descending a as bs = (a:as) : sequences bs
[1] https://www.isa-afp.org/entries/Efficient-Mergesort.html
[2] http://isabelle.in.tum.de/
_______________________________________________
Haskell-Cafe mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to

post.

...