Discussion:
[Haskell-cafe] reading file content with conduit
Fabien R
2018-11-02 12:08:40 UTC
Permalink
Hello,
I have a strange behaviour with my code when reading data from a file saved by sinkFile.
I only see the first record of the file, although the file seems to contain several records.

Any hints ?

-- built in a sandbox with GHC 8.2.2, base 4.10.1.0, binary 0.8.5.1, bytestring 0.8.10.2, conduit 1.3.1
{-# LANGUAGE DeriveGeneric #-}
import qualified Data.ByteString as DB
import qualified Data.ByteString.Lazy as DBL
import qualified Data.Binary as DBI
import GHC.Generics (Generic)
import Conduit
import Data.Int (Int64)

data MySubRec = MySubRec { sr1 :: Float,
sr2 :: Float }
deriving (Generic, Show)

data MyRec = MyRec { r1 :: Int64,
r2 :: String,
r3 :: [MySubRec],
r4 :: [MySubRec]
}
deriving (Generic, Show)

instance DBI.Binary MySubRec
instance DBI.Binary MyRec

es = MySubRec { sr1 =1.0, sr2 =1000.5 }
myList = repeat es
e1 = MyRec{ r1=1, r2="e1", r3=take 2 myList, r4=take 1 myList}
e2 = MyRec{ r1=2, r2="e2", r3=take 2 myList, r4=take 1 myList}
myData = concat $ repeat [e1,e2]

dataToBs :: Monad m =>
ConduitT MyRec DB.ByteString m ()
dataToBs = do
d <- await
case d of
Just bs -> do
yield $ DBL.toStrict $ DBI.encode bs
dataToBs
_ -> return ()
bsToData :: Monad m =>
ConduitT DB.ByteString MyRec m ()
bsToData = do
d <- await
case d of
Just bs -> do
yield $ DBI.decode $ DBL.fromStrict bs
bsToData
_ -> return ()
main = do
runConduitRes $ yieldMany (take 10 myData) .| dataToBs .| sinkFile "/tmp/res.bin"
runConduitRes $ sourceFile "/tmp/res.bin" .| bsToData .| mapM_C (liftIO . putStrLn . show)
Alexander V Vershilov
2018-11-02 12:15:36 UTC
Permalink
Hello Fabien,

in your example, the problem is that sourceFile gives you a bytestring
chunk that contains
more than one record and decoding function does not return a list of
values, instead
return only one. So you need to pass the leftover data to the next
decoding round.

You may consider using
https://hackage.haskell.org/package/binary-conduit package that
does that (see https://hackage.haskell.org/package/binary-conduit-1.3.1/docs/src/Data.Conduit.Serialization.Binary.html#conduitGet)

--
Best regards, Alexander.
Post by Fabien R
Hello,
I have a strange behaviour with my code when reading data from a file saved by sinkFile.
I only see the first record of the file, although the file seems to contain several records.
Any hints ?
-- built in a sandbox with GHC 8.2.2, base 4.10.1.0, binary 0.8.5.1, bytestring 0.8.10.2, conduit 1.3.1
{-# LANGUAGE DeriveGeneric #-}
import qualified Data.ByteString as DB
import qualified Data.ByteString.Lazy as DBL
import qualified Data.Binary as DBI
import GHC.Generics (Generic)
import Conduit
import Data.Int (Int64)
data MySubRec = MySubRec { sr1 :: Float,
sr2 :: Float }
deriving (Generic, Show)
data MyRec = MyRec { r1 :: Int64,
r2 :: String,
r3 :: [MySubRec],
r4 :: [MySubRec]
}
deriving (Generic, Show)
instance DBI.Binary MySubRec
instance DBI.Binary MyRec
es = MySubRec { sr1 =1.0, sr2 =1000.5 }
myList = repeat es
e1 = MyRec{ r1=1, r2="e1", r3=take 2 myList, r4=take 1 myList}
e2 = MyRec{ r1=2, r2="e2", r3=take 2 myList, r4=take 1 myList}
myData = concat $ repeat [e1,e2]
dataToBs :: Monad m =>
ConduitT MyRec DB.ByteString m ()
dataToBs = do
d <- await
case d of
Just bs -> do
yield $ DBL.toStrict $ DBI.encode bs
dataToBs
_ -> return ()
bsToData :: Monad m =>
ConduitT DB.ByteString MyRec m ()
bsToData = do
d <- await
case d of
Just bs -> do
yield $ DBI.decode $ DBL.fromStrict bs
bsToData
_ -> return ()
main = do
runConduitRes $ yieldMany (take 10 myData) .| dataToBs .| sinkFile "/tmp/res.bin"
runConduitRes $ sourceFile "/tmp/res.bin" .| bsToData .| mapM_C (liftIO . putStrLn . show)
_______________________________________________
Haskell-Cafe mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
--
Alexander
Fabien R
2018-11-04 08:48:03 UTC
Permalink
Thanks Alexander,
The package fixed the problem.
I thought that, since a conduit is driven by downstream,
if bsToData requested a record, sourceFile would only send the corresponding Bytestrings.

--
Fabien
Alexander V Vershilov
2018-11-05 10:17:46 UTC
Permalink
Hello Fabien,

your expectations are correct, but in order to make this really
happen your consumer function should be aware of the
conduit pipeline in order to consume only the required amount of data.
That may happen automatically in two cases:
1. your function consumes an entire chunk
2. your function can work in an iterative way and can return unprocessed
data or continuation that may consume more data (the case of iterative API
in binary)

A nice example of the function that is related to your use case and
aware of the conduit pipeline:

https://hackage.haskell.org/package/conduit-extra-1.3.0/docs/Data-Conduit-Attoparsec.html

If you pass a parser in sinkParser, it will consume only the required
amount of data.

--
Alexander
Post by Fabien R
Thanks Alexander,
The package fixed the problem.
I thought that, since a conduit is driven by downstream,
if bsToData requested a record, sourceFile would only send the corresponding Bytestrings.
--
Fabien
_______________________________________________
Haskell-Cafe mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
--
Alexander
Continue reading on narkive:
Loading...