Attoparsec: Easier way to parse data types from Data.Word and Data.Int?

Question

I'm currently using bytestring and attoparsec for the serialization and deserialization respectfully in a game netcode. I was originally attracted to using these libraries over cereal because bytestring gives pretty fine-grain control over Builders, including helpful alloctation strategies and low-level primatives. I thought it would be a good choice as it would ensure I would be better equipped to deal with any latency/GC issues I could run into later on in the project.

And while bytestring provides lots of combinators to common data types one would encounter with packet fields (mainly the types found in Data.Word and Data.Int like Word16, Word16, and Int8), I was disappointed when I couldn't locate any complementary combinators in attoparsec. Am I missing something? Could I mock up something equivalent with the provided combinators?

If it's the case that the functionality is missing, what is the usual way of adding this functionality in? I'm certainly not the first one to need to decode signed shorts with the library. Is there a reason this functionality doesn't exist? Is there a common library that I should supplement attoparsec with that I don't know about? Or should I do something like this:

import           Data.Bits
import qualified Data.ByteString as B
import qualified Data.ByteString.Unsafe as B
import qualified Data.Attoparsec.ByteString as Decode
import           Data.Int


decodeInt16BE :: Decode.Parser Int16
decodeInt16BE = do
  bs <- Decode.take 2
  return $! (fromIntegral (bs `B.unsafeIndex` 0) `shiftL` 8) .|.
            (fromIntegral (bs `B.unsafeIndex` 1) 1))

Because this is what cereal and binary do internally and what I'm currently doing to obtain this functionality for the time being, but it would be nice to not have to use ad hoc unsafe functions in order to do what bytestring, cereal, and binary already provide in their APIs.

What do most people do when they need to tackle Int64, Int32, Int16, Int8, Word64, Word32, and Word16 with attoparsec in a low-latency networking environment?

(NEWBIE NOTE) There's an assumption here that could be naive. I'm implicitly assuming cereal is not faster for handling network packets than implementations in bytestring and attoparsec. This assumption originated from watching some of the talks coming out on binary-serialise-cbor that point to rather large amounts of allocations taking place in cereal and binary due to their continuation approach to encoding and decoding binary data in buffers. I'm dealing with network packets that often can be encoded and decoded in a pretty straightforward and stateless way with the occasional field whose encoding/decoding subroutine is dependent on the value of a previously seen field. Maybe I need a reality check here and am using the wrong tools for the job? Maybe there isn't really much I can do at this high-level to improve my situation? Assume "don't prematurely optimize" isn't applicable in this case.

Well, B.unsafeIndex is perfectly safe if you know the string is long enough. — ErikR

ErikR ErikR · Accepted Answer · 2016-06-19T05:02:30

You should explain in more detail what you are doing with the packets. Most network packet processing does not require backtracking and so attoparsec is somewhat overkill. Also, attoparsec (and binary and cereal) requires you to visit every byte of the packet. However, the locations of fields within most network packets are at fixed offsets. Thus you can "randomly access" the fields once you've examined the header to determine what kind of packet you have.

I think you can achieve (near) zero-allocation implementation - just write your algorithm like you would do it in C: load your packet data into a mutable unboxed vector; keep an offset to the start of the current packet; if you don't have a complete packet in your buffer, move what you have to top of the vector and fill in the rest with new packet data.

Attoparsec: Easier way to parse data types from Data.Word and Data.Int?

1 Answers