1
votes

I am checking if protobuf-net can be an in place replacement for DataContracts. Besides the excellent performance it is really a neat library. The only issue I have is that the .NET serializers do not make any assumptions what they are currently de/serializing. Especially objects which do contain reference to the typed object are a problem.

[DataMember(Order = 3)]
public object Tag1 // The DataContract did contain a object which becomes now a SimulatedObject
{
    get;
    set;
}

I tried to mimic object with protocol buffers with a little generic helper which does store for each possible type in a different strongly typed field.

Is this an recommended approach to deal with fields which de/serialize into a number of different not related types?

Below is the sample code for a SimulatedObject which can hold up to 10 different types.

using System;
using System.Collections.Generic;
using System.IO;
using System.Runtime.Serialization;
using ProtoBuf;
using System.Diagnostics;

[DataContract]
public class SimulatedObject<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10>
{
    [DataMember(Order = 20)]
    byte FieldHasValue; // the number indicates which field actually has a value

    [DataMember(Order = 1)]
    T1 I1;

    [DataMember(Order = 2)]
    T2 I2;

    [DataMember(Order = 3)]
    T3 I3;

    [DataMember(Order = 4)]
    T4 I4;

    [DataMember(Order = 5)]
    T5 I5;

    [DataMember(Order = 6)]
    T6 I6;

    [DataMember(Order = 7)]
    T7 I7;

    [DataMember(Order = 8)]
    T8 I8;

    [DataMember(Order = 9)]
    T9 I9;

    [DataMember(Order = 10)]
    T10 I10;

    public object Data
    {
        get
        {
            switch(FieldHasValue)
            {
                case 0: return null;
                case 1: return I1;
                case 2: return I2;
                case 3: return I3;
                case 4: return I4;
                case 5: return I5;
                case 6: return I6;
                case 7: return I7;
                case 8: return I8;
                case 9: return I9;
                case 10: return I10;
                default:
                    throw new NotSupportedException(String.Format("The FieldHasValue field has an invlaid value {0}. This indicates corrupt data or incompatible data layout chagnes", FieldHasValue));
            }
        }
        set
        {
            I1 = default(T1);
            I2 = default(T2);
            I3 = default(T3);
            I4 = default(T4);
            I5 = default(T5);
            I6 = default(T6);
            I7 = default(T7);
            I8 = default(T8);
            I9 = default(T9);
            I10 = default(T10);


            if (value != null)
            {
                Type t = value.GetType();
                if (t == typeof(T1))
                {
                    FieldHasValue = 1;
                    I1 = (T1) value;
                }
                else if (t == typeof(T2))
                {
                    FieldHasValue = 2;
                    I2 = (T2) value;
                }
                else if (t == typeof(T3))
                {
                    FieldHasValue = 3;
                    I3 = (T3) value;
                }
                else if (t == typeof(T4))
                {
                    FieldHasValue = 4;
                    I4 = (T4) value;
                }
                else if (t == typeof(T5))
                {
                    FieldHasValue = 5;
                    I5 = (T5) value;
                }
                else if (t == typeof(T6))
                {
                    FieldHasValue = 6;
                    I6 = (T6) value;
                }
                else if (t == typeof(T7))
                {
                    FieldHasValue = 7;
                    I7 = (T7) value;
                }
                else if (t == typeof(T8))
                {
                    FieldHasValue = 8;
                    I8 = (T8) value;
                }
                else if (t == typeof(T9))
                {
                    FieldHasValue = 9;
                    I9 = (T9) value;
                }
                else if (t == typeof(T10))
                {
                    FieldHasValue = 10;
                    I10 = (T10) value;
                }
                else
                {
                    throw new NotSupportedException(String.Format("The type {0} is not supported for serialization. Please add the type to the SimulatedObject generic argument list.", t.FullName));
                }
            }
        }
    }
}

[DataContract]
class Customer
{
    /* 
    [DataMember(Order = 3)]
    public object Tag1 // The DataContract did contain a object which becomes now a SimulatedObject
    {
        get;
        set;
    }
    */

    [DataMember(Order = 3)]
    public SimulatedObject<bool, Other, Other, Other, Other, Other, Other, Other, Other, SomethingDifferent> Tag1 // Can contain up to 10 different types
    {
        get;
        set;
    }



    [DataMember(Order = 4)]
    public List<string> Strings
    {
        get;
        set;
    }
}

[DataContract]
public class Other
{
    [DataMember(Order = 1)]
    public string OtherData
    {
        get;
        set;
    }
}

[DataContract]
public class SomethingDifferent
{
    [DataMember(Order = 1)]
    public string OtherData
    {
        get;
        set;
    }

}


class Program
{
    static void Main(string[] args)
    {
        Customer c = new Customer
        {
            Strings = new List<string> { "First", "Second", "Third" },
            Tag1 = new SimulatedObject<bool, Other, Other, Other, Other, Other, Other, Other, Other, SomethingDifferent>
                    {
                        Data = new Other {  OtherData = "String value "}
                    }
        };

        const int Runs = 1000 * 1000;
        var stream = new MemoryStream();

        var sw = Stopwatch.StartNew();

        Serializer.Serialize<Customer>(stream, c);
        sw = Stopwatch.StartNew();
        for (int i = 0; i < Runs; i++)
        {
            stream.Position = 0;
            stream.SetLength(0);
            Serializer.Serialize<Customer>(stream, c);
        }
        sw.Stop();
        Console.WriteLine("Data Size with Protocol buffer Serializer: {0}, {1} objects did take {2}s", stream.ToArray().Length, Runs, sw.Elapsed.TotalSeconds);

        stream.Position = 0;
        var newCustw = Serializer.Deserialize<Customer>(stream);

        sw = Stopwatch.StartNew();
        for (int i = 0; i < Runs; i++)
        {
            stream.Position = 0;
            var newCust = Serializer.Deserialize<Customer>(stream);
        }
        sw.Stop();
        Console.WriteLine("Read object with Protocol buffer deserializer: {0} objects did take {1}s", Runs, sw.Elapsed.TotalSeconds);

    }
}
3

3 Answers

1
votes

No, this solution is hard to maintain in a long term.

I recommend that you prepend the full name of the serialized type to the serialized data in the serialization process and read the type name in the beginning of the deserialization process (no need to change the protobuf source-code)

As a side note, you should try to avoid mixing object types in the deserialization process. I'm assuming you are updating an existing .net application and can't re-design it.

Update: Sample code

public byte[] Serialize(object myObject)
{
    using (var ms = new MemoryStream())
    {
        Type type = myObject.GetType();
        var id = System.Text.ASCIIEncoding.ASCII.GetBytes(type.FullName + '|');
        ms.Write(id, 0, id.Length);
        Serializer.Serialize(ms, myObject);
        var bytes = ms.ToArray();
        return bytes;
    }
}

public object Deserialize(byte[] serializedData)
{
    StringBuilder sb = new StringBuilder();
    using (var ms = new MemoryStream(serializedData))
    {
        while (true)
        {
            var currentChar = (char)ms.ReadByte();
            if (currentChar == '|')
            {
                break;
            }

            sb.Append(currentChar);
        }

        string typeName = sb.ToString();

        // assuming that the calling assembly contains the desired type.
        // You can include aditional assembly information if necessary
        Type deserializationType = Assembly.GetCallingAssembly().GetType(typeName);

        MethodInfo mi = typeof(Serializer).GetMethod("Deserialize");
        MethodInfo genericMethod = mi.MakeGenericMethod(new[] { deserializationType });
        return genericMethod.Invoke(null, new[] { ms });
    }
}
0
votes

I'm working on something similar now and I provided first version of the lib already: http://bitcare.codeplex.com/

The current version doesn't support generics yet, but I plan to add it in the nearest time. I uploaded source code only there-when I'm ready with generics I prepare bin version also...

It assumes both sides (client and server) know what they serialize/deserialize so there is no any reason to embed there full metadata info. Because of this serialization results are very small and generated serializers work very fast. It has data dictionaries, uses smart data storage (stores only important bits in short) and does final compression when necessary. If you need it just try if it solves your problem.

The license is GPL, but I will change it soon to less restrictive one(free for commercial usage also, but on your own risk like in GPL)

-1
votes

The version I uploaded to codeplex is working with some of my product. It's tested with different set of unit tests of course. They are not uploaded there, because I ported it to vs2012 and .net 4.5 and decided to create new sets of test cases for the incoming release.

I don't deal with abstract (so called opened) generics. I process parametrized generics. From data contract point of view parametrized generics are just specialized classes so I can process them as usual (as other classes) - the difference is in objects construction only and storage optimizations.

When I store information about null value on Nullable<> it takes one bit in storage stream only, if it's not null value I do serialization according to type provided as generics parameter (so I do serialization of DateTime for instance that can take from one bit for so called default value to a few bytes). The goal was to generate serialization code according to the current knowledge about data contracts on classes instead of doing it on the fly and wasting memory and processing power. When I see the property in some class based on some generic during code generation I know all the properties of that generic and I know the type of every property :) From this point of view it's concrete class.

I will change the license soon. I have to figure out first how to do it :) , because I see it's possible to choose from list of the provided license types but I can't provide my own license text. I see the license of Newtonsoft.Json is what I would have also, but I don't know yet how to change it...

The documentation has not been provided there yet, but in short it's easy to prepare your own serialization tests. You have to compile assembly with your types you want to store/serialize effective way, then create *.tt files in your serialization library (like for person class-it checks dependencies and generates code for other dependent classes also) and save the files (when you save them then it generates all the code for cooperation with serialization library). You can also create the task in your build config to regenerate source code from tt files every time you build the solution(probably Entity Framework generates the code similar way during the build). You can compile your serialization library now and measure the performance and size of the results.

I need this serialization library for my framework for effective usage of entities with Azure Table and Blob storage so I plan to finish initial release soon...