[Haskell-cafe] reading file content with conduit
Fabien R
2018-11-02 12:08:40 UTC
I have a strange behaviour with my code when reading data from a file saved by sinkFile.
I only see the first record of the file, although the file seems to contain several records.

Any hints ?

-- built in a sandbox with GHC 8.2.2, base, binary, bytestring, conduit 1.3.1
{-# LANGUAGE DeriveGeneric #-}
import qualified Data.ByteString as DB
import qualified Data.ByteString.Lazy as DBL
import qualified Data.Binary as DBI
import GHC.Generics (Generic)
import Conduit
import Data.Int (Int64)

data MySubRec = MySubRec { sr1 :: Float,
sr2 :: Float }
deriving (Generic, Show)

data MyRec = MyRec { r1 :: Int64,
r2 :: String,
r3 :: [MySubRec],
r4 :: [MySubRec]
deriving (Generic, Show)

instance DBI.Binary MySubRec
instance DBI.Binary MyRec

es = MySubRec { sr1 =1.0, sr2 =1000.5 }
myList = repeat es
e1 = MyRec{ r1=1, r2="e1", r3=take 2 myList, r4=take 1 myList}
e2 = MyRec{ r1=2, r2="e2", r3=take 2 myList, r4=take 1 myList}
myData = concat $ repeat [e1,e2]

dataToBs :: Monad m =>
ConduitT MyRec DB.ByteString m ()
dataToBs = do
d <- await
case d of
Just bs -> do
yield $ DBL.toStrict $ DBI.encode bs
_ -> return ()
bsToData :: Monad m =>
ConduitT DB.ByteString MyRec m ()
bsToData = do
d <- await
case d of
Just bs -> do
yield $ DBI.decode $ DBL.fromStrict bs
_ -> return ()
main = do
runConduitRes $ yieldMany (take 10 myData) .| dataToBs .| sinkFile "/tmp/res.bin"
runConduitRes $ sourceFile "/tmp/res.bin" .| bsToData .| mapM_C (liftIO . putStrLn . show)
Alexander V Vershilov
2018-11-02 12:15:36 UTC
Hello Fabien,

in your example, the problem is that sourceFile gives you a bytestring
chunk that contains
more than one record and decoding function does not return a list of
values, instead
return only one. So you need to pass the leftover data to the next
decoding round.

You may consider using
https://hackage.haskell.org/package/binary-conduit package that
does that (see https://hackage.haskell.org/package/binary-conduit-1.3.1/docs/src/Data.Conduit.Serialization.Binary.html#conduitGet)

Best regards, Alexander.
Post by Fabien R
I have a strange behaviour with my code when reading data from a file saved by sinkFile.
I only see the first record of the file, although the file seems to contain several records.
Any hints ?
-- built in a sandbox with GHC 8.2.2, base, binary, bytestring, conduit 1.3.1
{-# LANGUAGE DeriveGeneric #-}
import qualified Data.ByteString as DB
import qualified Data.ByteString.Lazy as DBL
import qualified Data.Binary as DBI
import GHC.Generics (Generic)
import Conduit
import Data.Int (Int64)
data MySubRec = MySubRec { sr1 :: Float,
sr2 :: Float }
deriving (Generic, Show)
data MyRec = MyRec { r1 :: Int64,
r2 :: String,
r3 :: [MySubRec],
r4 :: [MySubRec]
deriving (Generic, Show)
instance DBI.Binary MySubRec
instance DBI.Binary MyRec
es = MySubRec { sr1 =1.0, sr2 =1000.5 }
myList = repeat es
e1 = MyRec{ r1=1, r2="e1", r3=take 2 myList, r4=take 1 myList}
e2 = MyRec{ r1=2, r2="e2", r3=take 2 myList, r4=take 1 myList}
myData = concat $ repeat [e1,e2]
dataToBs :: Monad m =>
ConduitT MyRec DB.ByteString m ()
dataToBs = do
d <- await
case d of
Just bs -> do
yield $ DBL.toStrict $ DBI.encode bs
_ -> return ()
bsToData :: Monad m =>
ConduitT DB.ByteString MyRec m ()
bsToData = do
d <- await
case d of
Just bs -> do
yield $ DBI.decode $ DBL.fromStrict bs
_ -> return ()
main = do
runConduitRes $ yieldMany (take 10 myData) .| dataToBs .| sinkFile "/tmp/res.bin"
runConduitRes $ sourceFile "/tmp/res.bin" .| bsToData .| mapM_C (liftIO . putStrLn . show)
Haskell-Cafe mailing list
Only members subscribed via the mailman list are allowed to post.
Fabien R
2018-11-04 08:48:03 UTC
Thanks Alexander,
The package fixed the problem.
I thought that, since a conduit is driven by downstream,
if bsToData requested a record, sourceFile would only send the corresponding Bytestrings.

Alexander V Vershilov
2018-11-05 10:17:46 UTC
Hello Fabien,

your expectations are correct, but in order to make this really
happen your consumer function should be aware of the
conduit pipeline in order to consume only the required amount of data.
That may happen automatically in two cases:
1. your function consumes an entire chunk
2. your function can work in an iterative way and can return unprocessed
data or continuation that may consume more data (the case of iterative API
in binary)

A nice example of the function that is related to your use case and
aware of the conduit pipeline:


If you pass a parser in sinkParser, it will consume only the required
amount of data.

Post by Fabien R
Thanks Alexander,
The package fixed the problem.
I thought that, since a conduit is driven by downstream,
if bsToData requested a record, sourceFile would only send the corresponding Bytestrings.
Haskell-Cafe mailing list
Only members subscribed via the mailman list are allowed to post.
Continue reading on narkive: