2
votes

We are using protobuf-net v.2.3.2 to serialize and deserialize some complex objects (with lists, dictionaries etc. inside) in our project. Most of the time, everything is fine, but in some rare cases we are encountering very strange behavior: the object serialized in one process causes errors on deserialization in the other process, if the call to serializer's .FromProto<SomeComplexType>(bytes) method in that second process is not preceded by call to .ToProto(someComplexObject).

Here is an example: let's say our Process 1 looks like this:

class Program1 {
    public static void Main()
    {
        SomeComplexType complexObject = new SomeComplexType();

        // Here goes some code filling complexObject with data

        byte[] serialized = ToProto(complexObject);

        File.WriteAllBytes("serialized.data", serialized);
    }

    public static byte[] ToProto(object value)
    {
        using (var stream = new MemoryStream())
        {
            ProtoBuf.Serializer.Serialize(stream, value);
            return stream.ToArray();
        }
    }

    public static T FromProto<T>(byte[] value)
    {
        using (var stream = new MemoryStream(value))
        {
            return ProtoBuf.Serializer.Deserialize<T>(stream);
        }
    }
}

Now, we are trying to read that object in the Process 2:

class Program2 {
    public static void Main()
    {
        byte[] serialized = File.ReadAllBytes("serialized.data");

        SomeComplexType complexObject =                
            FromProto<SomeComplexType>(serialized);
    }

    public static byte[] ToProto(object value)
    {
        using (var stream = new MemoryStream())
        {
            ProtoBuf.Serializer.Serialize(stream, value);
            return stream.ToArray();
        }
    }

    public static T FromProto<T>(byte[] value)
    {
        using (var stream = new MemoryStream(value))
        {
            return ProtoBuf.Serializer.Deserialize<T>(stream);
        }
    }
}

What we see is that in some rare cases Process 1 generates the file that makes Process 2 to fail on call to FromProto (we observed various errors, starting from 'missing parameterless constructor' up to StackOverflowException).

However, adding a line like this: ToProto(new SomeComplexType()); somewhere before the call to FromProto makes the errors go away, and the same set of bytes is being deserialized without a hitch. No other methods (we tried PrepareSerializer, GetSchema) seem to do the trick.

It looks like there are some subtle differences in how ToProto and FromProto parse the object model. Another point is that ProtoBuf seems to "remember" the state after call to ToProto that helps it with subsequent deserializations.

UPDATE: Here is more details: The class structure that we have looks similar to this (very much simplified):

[ProtoContract(ImplicitFields = ImplicitFields.AllPublic)]
[ProtoInclude(1, typeof(A))]
[ProtoInclude(2, typeof(B))]
public interface IBase
{
    [ProtoIgnore]
    string Id { get; }
}

[ProtoContract(ImplicitFields=ImplicitFields.AllPublic, AsReferenceDefault=true)]
public class A : IBase
{
    [ProtoIgnore]
    public string Id { get; }

    public string PropertyA { get; set; }
}

[ProtoContract(ImplicitFields=ImplicitFields.AllPublic, AsReferenceDefault=true)]
public class B : IBase
{
    [ProtoIgnore]
    public string Id { get; }

    public string PropertyB { get; set; }
}

[ProtoContract(ImplicitFields=ImplicitFields.AllPublic, AsReferenceDefault=true)]
public class C
{
    public List<IBase> ListOfBase = new List<IBase>();
}

[ProtoContract(ImplicitFields=ImplicitFields.AllPublic, AsReferenceDefault=true)]
public class D
{
    public C PropertyC { get; set; }
    public Dictionary<string, B> DictionaryOfBs { get; set; }
}

The root cause of the problem seems to be somewhat non-deterministic way in which Protobuf-net prepares serializers for types. Here is what we observe.

Let say we have two programs: producer and consumer. Producer creates an instance of D, adds some data and serializes that instance using protobuf-net. Consumer picks up that serialized data and deserializes it into instance of D.

In producer, protobuf sometimes discovers type B before it discovers IBase, so it generates serializer for B and serializes values in DictionaryOfBs as straight instances of B.

In consumer, it may so happen that protobuf-net may discover IBase first, so when it prepares (de)serializer for B, it treats it as subclass of IBase. So when it comes to deserializing values for DictionaryOfBs, it is trying to read them as subclass of IBase, expecting field number to discriminate between A and B. The data in the stream may be such that IBase serializer decides that what it sees is an instance of A, tries to convert it to B (using Merge method) and gets into infinite recursion trying to convert A into B into A into B etc., thus resulting in eventual StackOverflowException.

Adding Serializer.Serialize(stream, new D()) before deserialization changes the order in which serializers are created, so there is no error in that case, although it seems to be a lucky coincidence. Unfortunately, in our case even that cannot be used as satisfactory workaround, because that leads to occasional "Internal error; a key mismatch occurred" errors on deserialization.

1
is there any reason you not used .net serialization like xml,binary ??Pranay Rana
@Pranay there are lots of reasons not to use BinaryFormatter, but: protobuf is binary (at least in the same sense that BinaryFormatter is). As for why: performance and size of generated output, typically.Marc Gravell
@MarcGravell - ok , thats new to me , i will read it out than for informationPranay Rana
@NikNik I'm the author of protobuf-net, and I don't recall FromProto / ToProto methods. Now, it is entirely possible that I've simply forgotten them (I'm not at a PC), but: are you sure they're not your own methods? I can't see them here: github.com/mgravell/protobuf-net/blob/master/src/protobuf-net/… and that class is not marked partial, so I shouldn't need to look in any other files...Marc Gravell
@Pranay I forgot the most obvious reason: because you want to share data with other platforms that are talking protobuf; there are implementations for virtually any platform you can name (since it is used by many Google APIs)Marc Gravell

1 Answers

1
votes

The serialize code is using the generic API but is using <object> due to generic type inference. This can confuse things. The first thing to try is for the ToProto method to use Serializer.NonGeneric.Serialize - this will use .GetType() etc and should hopefully confuse it less.

Alternatively: make ToProto generic with T value.

Note: I haven't tested this - but it is the first thing to try.