We just did an internal study on serializers, here are some results (for my future reference too!)
Thrift = serialization + RPC stack
The biggest difference is that Thrift is not just a serialization protocol, it's a full blown RPC stack that's like a modern day SOAP stack. So after the serialization, the objects could (but not mandated) be sent between machines over TCP/IP. In SOAP, you started with a WSDL document that fully describes the available services (remote methods) and the expected arguments/objects. Those objects were sent via XML. In Thrift, the .thrift file fully describes the available methods, expected parameter objects and the objects are serialized via one of the available serializers (with Compact Protocol
, an efficient binary protocol, being most popular in production).
ASN.1 = Grand daddy
ASN.1 was designed by telecom folks in the 80s and is awkward to use due to limited library support as compared to recent serializers which emerged from CompSci folks. There are two variants, DER (binary) encoding and PEM (ascii) encoding. Both are fast, but DER is faster and more size efficient of the two. In fact ASN.1 DER can easily keep up (and sometimes beat) serializers that were designed 30 years after itself, a testament to it's well engineered design. It's very compact, smaller than Protocol Buffers and Thrift, only beaten by Avro. The issue is having great libraries to support and right now Bouncy Castle seems to be the best one for C#/Java. ASN.1 is king in security and crypto systems and isn't going to go away, so don't be worried about 'future proofing'. Just get a good library...
MessagePack = middle of the pack
It's not bad but it's neither the fastest, nor the smallest nor the best supported. No production reason to choose it.
Common
Beyond that, they are fairly similar. Most are variants of the basic TLV: Type-Length-Value
principle.
Protocol Buffers (Google originated), Avro (Apache based, used in Hadoop), Thrift (Facebook originated, now Apache project) and ASN.1 (Telecom originated) all involve some level of code generation where you first express your data in a serializer-specific format, then the serializer "compiler" will generate source code for your language via the code-gen
phase. Your app source then uses these code-gen
classes for IO. Note that certain implementations (eg: Microsoft's Avro library or Marc Gavel's ProtoBuf.NET) let you directly decorate your app level POCO/POJO objects and then the library directly uses those decorated classes instead of any code-gen's classes. We've seen this offer a boost performance since it eliminates a object copy stage (from application level POCO/POJO fields to code-gen fields).
Some results and a live project to play with
This project (https://github.com/sidshetye/SerializersCompare) compares important serializers in the C# world. The Java folks already have something similar.
1000 iterations per serializer, average times listed
Sorting result by size
Name Bytes Time (ms)
------------------------------------
Avro (cheating) 133 0.0142
Avro 133 0.0568
Avro MSFT 141 0.0051
Thrift (cheating) 148 0.0069
Thrift 148 0.1470
ProtoBuf 155 0.0077
MessagePack 230 0.0296
ServiceStackJSV 258 0.0159
Json.NET BSON 286 0.0381
ServiceStackJson 290 0.0164
Json.NET 290 0.0333
XmlSerializer 571 0.1025
Binary Formatter 748 0.0344
Options: (T)est, (R)esults, s(O)rt order, (S)erializer output, (D)eserializer output (in JSON form), (E)xit
Serialized via ASN.1 DER encoding to 148 bytes in 0.0674ms (hacked experiment!)