23
votes

I have a function that takes a lazy ByteString, that I wish to have return lists of strict ByteStrings (the laziness should be transferred to the list type of the output).

import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as L
csVals :: L.ByteString -> [B.ByteString]

I want to do this for various reasons, several lexing functions require strict ByteStrings, and I can guarantee the outputted strict ByteStrings in the output of csVals above are very small.

How do I go about "strictifying" ByteStrings without chunking them?

Update0

I want to take a Lazy ByteString, and make one strict ByteString containing all its data.

5
What is your problem with toChunks? From the initial glimpse it looks like it preserves laziness.Mikhail Glushenkov
@Matt Joiner:Maybe you should write a lexing yourself, or force eval the results using DeepSeq.wuxb
@Matt Joiner: there is a Lazy version: 'Data.ByteString.Lex.Lazy.Double' in the same package.wuxb
@Matt Joiner: so you want chunks of specified size? Possibly repeated calls to splitAt? Note that toChunks generates strict ByteStrings are of maximum size (except for possibly the last one).ivanm
There's a misunderstanding here -- a lazy bytestring is just a list of chunks (i.e. strict bytestrings), essentially. toChunks exposes that structure. To put the list all in one strict bytestring, there's no other way than concat . toChunks (or the equiv). In many typical cases, the list will have a single element -- in those cases concat . toChunks will be relatively efficient as well.sclv

5 Answers

39
votes

The bytestring package now exports a toStrict function:

http://hackage.haskell.org/packages/archive/bytestring/0.10.2.0/doc/html/Data-ByteString-Lazy.html#v:toStrict

This might not be exactly what you want, but it certainly answers the question in the title of this post :)

17
votes

Like @sclv said in the comments above, a lazy bytestring is just a list of strict bytestrings. There are two approaches to converting lazy ByteString to strict (source: haskell mailing list discussion about adding toStrict function) - relevant code from the email thread below:

First, relevant libraries:

import qualified Data.ByteString               as B
import qualified Data.ByteString.Internal      as BI
import qualified Data.ByteString.Lazy          as BL
import qualified Data.ByteString.Lazy.Internal as BLI
import           Foreign.ForeignPtr
import           Foreign.Ptr

Approach 1 (same as @sclv):

toStrict1 :: BL.ByteString -> B.ByteString
toStrict1 = B.concat . BL.toChunks

Approach 2:

toStrict2 :: BL.ByteString -> B.ByteString
toStrict2 BLI.Empty = B.empty
toStrict2 (BLI.Chunk c BLI.Empty) = c
toStrict2 lb = BI.unsafeCreate len $ go lb
  where
    len = BLI.foldlChunks (\l sb -> l + B.length sb) 0 lb

    go  BLI.Empty                   _   = return ()
    go (BLI.Chunk (BI.PS fp s l) r) ptr =
        withForeignPtr fp $ \p -> do
            BI.memcpy ptr (p `plusPtr` s) (fromIntegral l)
            go r (ptr `plusPtr` l)

If performance is a concern, I recommend checking out the email thread above. It has criterion benchmark as well. toStrict2 is faster than toStrict1 in those benchmarks.

5
votes

If the lazy ByteString in question is <= the maximum size of a strict ByteString:

toStrict = fromMaybe SB.empty . listToMaybe . toChunks

toChunks makes each chunk be as large as possible (except for possibly the last one).

If the size of you lazy ByteString is larger than what a strict ByteString can be, then this isn't possible: that's exactly what lazy ByteStrings are for.

2
votes

Data.ByteString.Lazy.Char8 now has toStrict and fromStrict functions.

1
votes

You can also use blaze-builder to build strict ByteString from lazy

toStrict :: BL.ByteString -> BS.ByteString
toStrict = toByteString . fromLazyByteString

It must be effective.