4
votes

I'm currently using Erlang for a big project but i have a question regarding a proper proceeding.

I receive bytes over a tcp socket. The bytes are according to a fixed protocol, the sender is a pyton client. The python client uses class inheritance to create bytes from the objects.

Now i would like to (in Erlang) take the bytes and convert these to their equivelant messages, they all have a common message header.

How can i do this as generic as possible in Erlang?

Kind Regards,

Me

1

1 Answers

2
votes

Pattern matching/binary header consumption using Erlang's binary syntax. But you will need to know either exactly what bytes or bits your are expecting to receive, or the field sizes in bytes or bits.

For example, let's say that you are expecting a string of bytes that will either begin with the equivalent of the ASCII strings "PUSH" or "PULL", followed by some other data you will place somewhere. You can create a function head that matches those, and captures the rest to pass on to a function that does "push()" or "pull()" based on the byte header:

operation_type(<<"PUSH", Rest/binary>>) -> push(Rest);
operation_type(<<"PULL", Rest/binary>>) -> pull(Rest).

The bytes after the first four will now be in Rest, leaving you free to interpret whatever subsequent headers or data remain in turn. You could also match on the whole binary:

operation_type(Bin = <<"PUSH", _/binary>>) -> push(Bin);
operation_type(Bin = <<"PULL", _/binary>>) -> pull(Bin).

In this case the "_" variable works like it always does -- you're just checking for the lead, essentially peeking the buffer and passing the whole thing on based on the initial contents.

You could also skip around in it. Say you knew you were going to receive a binary with 4 bytes of fluff at the front, 6 bytes of type data, and then the rest you want to pass on:

filter_thingy(<<_:4/binary, Type:6/binary, Rest/binary>>) ->
    % Do stuff with Rest based on Type...

It becomes very natural to split binaries in function headers (whether the data equates to character strings or not), letting the "Rest" fall through to appropriate functions as you go along. If you are receiving Python pickle data or something similar, you would want to write the parsing routine in a recursive way, so that the conclusion of each data type returns you to the top to determine the next type, with an accumulated tree that represents the data read so far.

I only covered 8-bit bytes above, but there is also a pure bitstring syntax, which lets you go as far into the weeds with bits and bytes as you need with the same ease of syntax. Matching is a real lifesaver here.

Hopefully this informed more than confused. Binary syntax in Erlang makes this the most pleasant binary parsing environment in a general programming language I've yet encountered.

http://www.erlang.org/doc/programming_examples/bit_syntax.html