1
votes

I updated an older version of protobuf to the current one in a huge project (the used version is around 1-2 years old. I don't know the rev). Sadly the newer version throws an exception

CreateWireTypeException in ProtoReader.cs line 292

in the following test case:

    enum Test
    {
        test1 = 0,
        test2
    };
    static public void Test1()
    {
        Test original = Test.test2;
        using (MemoryStream ms = new MemoryStream())
        {
            Serializer.SerializeWithLengthPrefix<Test>(ms, original, PrefixStyle.Fixed32, 1);
            ms.Position = 0;
            Test obj;
            obj = Serializer.DeserializeWithLengthPrefix<Test>(ms, PrefixStyle.Fixed32);
        }
    }

I found out enums are not supposed to serialized directly outside of a class but our system is too huge to simply wrap all the enums in classes. Are there any other solutions to this problem? It works fine with Serialize and Deserialize only the DeserializeWithLengthPrefix throws exceptions.

The testcase works fine in older revisions e.g. r262 of protobuf-net.

1

1 Answers

3
votes

Simply, a bug; this is fixed in r640 (now deployed to both NuGet and google-code), along with an additional test based on your code above so that it can't creep back in.


Re performance (comments); the first hint I would look at would be: "prefer groups". Basically, the protobuf specification includes 2 different ways of including sub-objects - "groups" and "length-prefix". Groups was the original implementation, but google have now move towards "length-prefix", and try to advise people not to use "groups". However! Because of how protobuf-net works, "groups" are actually noticeably cheaper to write; this is because unlike the google implementation, protobuf-net does not know the length of things in advance. This means that to write a length-prefix, it needs to do one of:

  • calculate the length (almost as much work as actually serializing the data, bud adds an entire duplicate of the code) as needed; write the length, then actually serialize the data
  • serialize to a buffer, write the length, write the buffer
  • leave a place-holder, serialize, then loop back and write the actual length into the place-holder, adjusting the padding if needed

I've implemented all 3 approaches at different times, but v2 uses the 3rd option. I keep toying with adding a 4th implementation:

  • leave a place-holder, serialize, then loop back and write the actual length using an overlong form (so no padding adjustments ever needed)

but... consensus seems to be that the "overlong form" is a bit risky; still, it would work nicely for protobuf-net to protobuf-net.

But as you can see: length-prefix always has some overhead. Now imagine fairly deeply nested objects, and you can see a few blips. Groups work very differently; the encoding format for a group is:

  • write a start marker; serialize; write an end marker

that's it; no length needed; really, really, really cheap to write. On the wire, the main difference between them is:

  • groups: cheap to write, but you can't skip them if you encounter them as unexpected data; you have to parse the headers of the payload
  • length-prefix: more expensive to write, but cheap to skip if you encounter them as unexpected data - you just read the length and copy/move that many bytes

But! too much detail!

What does that mean for you? Well, imagine you have:

[ProtoContract]
public class SomeWrapper
{
    [ProtoMember(1)]
    public List<Person> People { get { return people; } }

    private readonly List<Person> people = new List<Person>();
}

You can make the super complex change:

[ProtoContract]
public class SomeWrapper
{
    [ProtoMember(1, DataFormat=DataFormat.Group)]
    public List<Person> People { get { return people; } }

    private readonly List<Person> people = new List<Person>();
}

and it'll use the cheaper encoding scheme. All your existing data will be fine as long as you are using protobuf-net.