Discussion:
[Haskell-cafe] Why doesn't this consume all the computer's memory?
Tyson Whitehead
2018-11-05 19:00:39 UTC
Permalink
I would expect the following to consume all the computer's memory and
die due to a buildup of lazy pattern matches for the `y` value.

```
import Data.Either

main = print x >> print y
where
(length -> x, length -> y) = paritionEithers $ repeat (Left ())
```

That is, `partitionEithers` is

```
partitionEithers :: [Either a b] -> ([a],[b])
partitionEithers = foldr go ([],[])
where
go (Left x) ~(xs,ys) = (x:xs,ys)
go (Right y) ~(xs,ys) = (xs,y:ys)
```

and, in the -ddump-simpl we see the `go Left` branch returns a thunk
on both the right and left sides that hold onto the evaluation of
(x:xs,ys) as we would expect

```
Left x_aqy ->
(GHC.Types.:
@ a_a1q8 x_aqy
(case ds1_d1rO of { (xs_aqz, ys_aqA) -> xs_aqz }),
case ds1_d1rO of { (xs_aqz, ys_aqA) -> ys_aqA });
```

Our code keeps generating more and more of these thunks as the
left-hand side chases down the infinite list of `Left ()` values, and
the machine cannot let go of them because, as far as it knows, we are
going to reach the end sometime and then need the right-hand side.

Thus I expect it would consume all the memory and crash. But it
doesn't. It just sits there forever consuming 100% CPU at a constant
memory limit. This means my mental model is defective and I'm unable
to properly reason about the space usage of my programs.

Could someone please enlighten me as to were I'm missing? Is there
some sort of optimization going on here? When can it be depend on?

Thanks very much! -Tyson

I would expect
Tyson Whitehead
2018-11-05 19:25:50 UTC
Permalink
I believe I actually figured it out. There is not buildup because y
is just forever bound to

y = length . snd $ paritionEithers $ repeat (Left ())

I guess the thing to realize is that this function will traverse the
list twice. That is, what I wrote is essentially

x = length . fst $ paritionEithers $ repeat (Left ())
y = length . snd $ paritionEithers $ repeat (Left ())

where both x and y independently traverse the entire list repeating
any work that needs to be done to generate the elements.

Thanks! -Tyson
Post by Tyson Whitehead
I would expect the following to consume all the computer's memory and
die due to a buildup of lazy pattern matches for the `y` value.
```
import Data.Either
main = print x >> print y
where
(length -> x, length -> y) = paritionEithers $ repeat (Left ())
```
That is, `partitionEithers` is
```
partitionEithers :: [Either a b] -> ([a],[b])
partitionEithers = foldr go ([],[])
where
go (Left x) ~(xs,ys) = (x:xs,ys)
go (Right y) ~(xs,ys) = (xs,y:ys)
```
and, in the -ddump-simpl we see the `go Left` branch returns a thunk
on both the right and left sides that hold onto the evaluation of
(x:xs,ys) as we would expect
```
Left x_aqy ->
@ a_a1q8 x_aqy
(case ds1_d1rO of { (xs_aqz, ys_aqA) -> xs_aqz }),
case ds1_d1rO of { (xs_aqz, ys_aqA) -> ys_aqA });
```
Our code keeps generating more and more of these thunks as the
left-hand side chases down the infinite list of `Left ()` values, and
the machine cannot let go of them because, as far as it knows, we are
going to reach the end sometime and then need the right-hand side.
Thus I expect it would consume all the memory and crash. But it
doesn't. It just sits there forever consuming 100% CPU at a constant
memory limit. This means my mental model is defective and I'm unable
to properly reason about the space usage of my programs.
Could someone please enlighten me as to were I'm missing? Is there
some sort of optimization going on here? When can it be depend on?
Thanks very much! -Tyson
I would expect
Tyson Whitehead
2018-11-08 04:42:11 UTC
Permalink
I take back my follow up comment. I still don't understand why there
isn't a buildup of thunks.

I've written an updated/simplified variant and posted it on
r/haskellquestions. Hopefully someone will enlighten me.

https://www.reddit.com/r/haskellquestions/comments/9v6z49/why_does_this_code_not_eat_all_the_memory_and_die/

Thanks! -Tyson
Post by Tyson Whitehead
I believe I actually figured it out. There is not buildup because y
is just forever bound to
y = length . snd $ paritionEithers $ repeat (Left ())
I guess the thing to realize is that this function will traverse the
list twice. That is, what I wrote is essentially
x = length . fst $ paritionEithers $ repeat (Left ())
y = length . snd $ paritionEithers $ repeat (Left ())
where both x and y independently traverse the entire list repeating
any work that needs to be done to generate the elements.
Thanks! -Tyson
Post by Tyson Whitehead
I would expect the following to consume all the computer's memory and
die due to a buildup of lazy pattern matches for the `y` value.
```
import Data.Either
main = print x >> print y
where
(length -> x, length -> y) = paritionEithers $ repeat (Left ())
```
That is, `partitionEithers` is
```
partitionEithers :: [Either a b] -> ([a],[b])
partitionEithers = foldr go ([],[])
where
go (Left x) ~(xs,ys) = (x:xs,ys)
go (Right y) ~(xs,ys) = (xs,y:ys)
```
and, in the -ddump-simpl we see the `go Left` branch returns a thunk
on both the right and left sides that hold onto the evaluation of
(x:xs,ys) as we would expect
```
Left x_aqy ->
@ a_a1q8 x_aqy
(case ds1_d1rO of { (xs_aqz, ys_aqA) -> xs_aqz }),
case ds1_d1rO of { (xs_aqz, ys_aqA) -> ys_aqA });
```
Our code keeps generating more and more of these thunks as the
left-hand side chases down the infinite list of `Left ()` values, and
the machine cannot let go of them because, as far as it knows, we are
going to reach the end sometime and then need the right-hand side.
Thus I expect it would consume all the memory and crash. But it
doesn't. It just sits there forever consuming 100% CPU at a constant
memory limit. This means my mental model is defective and I'm unable
to properly reason about the space usage of my programs.
Could someone please enlighten me as to were I'm missing? Is there
some sort of optimization going on here? When can it be depend on?
Thanks very much! -Tyson
I would expect
Tyson Whitehead
2018-11-08 06:09:08 UTC
Permalink
Sorry for all the noise. I believe I finally tracked down the eat all
the memory/don't eat all the memory trigger. It is the view pattern.

When I calculate the length in a view pattern, memory stays constant,
when I calculate it outside, it explodes.

I'll have to examine the simpl dump some more to see if I can figure
out the difference between these two. Sorry for the noise on the
list.

Cheers! -Tyson
Post by Tyson Whitehead
I take back my follow up comment. I still don't understand why there
isn't a buildup of thunks.
I've written an updated/simplified variant and posted it on
r/haskellquestions. Hopefully someone will enlighten me.
https://www.reddit.com/r/haskellquestions/comments/9v6z49/why_does_this_code_not_eat_all_the_memory_and_die/
Thanks! -Tyson
Post by Tyson Whitehead
I believe I actually figured it out. There is not buildup because y
is just forever bound to
y = length . snd $ paritionEithers $ repeat (Left ())
I guess the thing to realize is that this function will traverse the
list twice. That is, what I wrote is essentially
x = length . fst $ paritionEithers $ repeat (Left ())
y = length . snd $ paritionEithers $ repeat (Left ())
where both x and y independently traverse the entire list repeating
any work that needs to be done to generate the elements.
Thanks! -Tyson
Post by Tyson Whitehead
I would expect the following to consume all the computer's memory and
die due to a buildup of lazy pattern matches for the `y` value.
```
import Data.Either
main = print x >> print y
where
(length -> x, length -> y) = paritionEithers $ repeat (Left ())
```
That is, `partitionEithers` is
```
partitionEithers :: [Either a b] -> ([a],[b])
partitionEithers = foldr go ([],[])
where
go (Left x) ~(xs,ys) = (x:xs,ys)
go (Right y) ~(xs,ys) = (xs,y:ys)
```
and, in the -ddump-simpl we see the `go Left` branch returns a thunk
on both the right and left sides that hold onto the evaluation of
(x:xs,ys) as we would expect
```
Left x_aqy ->
@ a_a1q8 x_aqy
(case ds1_d1rO of { (xs_aqz, ys_aqA) -> xs_aqz }),
case ds1_d1rO of { (xs_aqz, ys_aqA) -> ys_aqA });
```
Our code keeps generating more and more of these thunks as the
left-hand side chases down the infinite list of `Left ()` values, and
the machine cannot let go of them because, as far as it knows, we are
going to reach the end sometime and then need the right-hand side.
Thus I expect it would consume all the memory and crash. But it
doesn't. It just sits there forever consuming 100% CPU at a constant
memory limit. This means my mental model is defective and I'm unable
to properly reason about the space usage of my programs.
Could someone please enlighten me as to were I'm missing? Is there
some sort of optimization going on here? When can it be depend on?
Thanks very much! -Tyson
I would expect
Tyson Whitehead
2018-11-08 16:04:34 UTC
Permalink
Sorry for all the noise. I believe I finally tracked down the eat all the memory/don't eat all the memory trigger. It is the view pattern.
There was a request to post both codes as it seems a bit unexpected
that a view pattern would make that difference.

Here they are. I compiled both with `ghc file.hs` using the standard
GHC 8.4.3 from NixOS 18.09.

Constant memory code (RES 6MB):

{-# LANGUAGE ViewPatterns #-}

module Main (main) where

import Data.Either

(length -> lx,length -> ly) = partitionEithers (repeat $ Left ())

main = do
print lx
print ly

Unbounded memory:

module Main (main) where

import Data.Either

(xs, ys) = partitionEithers (repeat $ Left ())

main = do
print $ length xs
print $ length ys

Cheers! -Tyson

PS: The constant-memory view-pattern version seems to compile down to

lxly = case partitionEithers (repeat $ Left ()) of
(xs,ys) -> (length xs,length ys)

main = do
print (case lxly of (lx,_) -> lx)
print (case lxly of (_,ly) -> ly)

while the unbounded-memory non-view-pattern one compiles down to

xsys = partitionEithers (repeat $ Left ())
xs = case xsys of (xs,_) -> xs
ys = case xsys of (_,ys) -> ys

main = do
print (length xs)
print (length ys)
Viktor Dukhovni
2018-11-08 19:15:36 UTC
Permalink
It seems that it only runs in constant space when the two lengths
compile to a pre-evaluated CAF.

In the below version, at low optimization levels the evaluation of lx/ly
is deferred to the "forkIO" thread, and memory use grows linearly with
the timeout.

At high optimization levels, memory use is constant, but the timeout
never happens, and it seems plausible that the CAF is lifted out to
the top level, and is evaluated in constant space (but infinite time).

So it seems, that as a CAF, the generated code does not attempt to
memoize the input infinite list. It may be worth noting that if
"repeat" is replaced with "replicate 10000", "replicate 1000000",
... memory usage grows with the size of the generated list. Only
the infinite list when pre-computed as a CAF seems to "run" in
constant space. (Scare quotes around "run" since in this it
never completes the computation. You either never finish,
or use unbounded space, pick your poison).

------ snip ------
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE ViewPatterns #-}

module Main (main) where
import System.Environment
import System.Timeout
import Control.Concurrent
import Control.Concurrent.MVar
import Data.List

partitionEithers :: [Either a b] -> ([a],[b])
partitionEithers = foldr go ([],[])
where
go (Left x) ~(xs,ys) = (x:xs,ys)
go (Right y) ~(xs,ys) = (xs,y:ys)

main = do
n <- getArgs >>= \case
[] -> return 100
a:_ -> return $ read a
m <- newEmptyMVar
forkIO $ do
let (length -> lx, length -> ly) = partitionEithers $ repeat $ Left ()
print lx
print ly
putMVar m ()
timeout n $ takeMVar m
------ snip ------
I must admit I'm stumped! I don't see any significant difference between
those two programs.
Post by Tyson Whitehead
{-# LANGUAGE ViewPatterns #-}
module Main (main) where
import Data.Either
(length -> lx,length -> ly) = partitionEithers (repeat $ Left ())
main = do
print lx
print ly
module Main (main) where
import Data.Either
(xs, ys) = partitionEithers (repeat $ Left ())
main = do
print $ length xs
print $ length ys
--
Viktor.
Continue reading on narkive:
Loading...