2
votes

We are using protobuf-net to serialize and de-serialize deep object graphs. Some members of these graphs are really repeated over and over across many graphs, as is the case for many-to-one relations to a few objects. For example, a Sale object would be a graph, and its Store member (possibly a sub-graph on its own) would be only one out of 10 possible stores, and thus it would be repeated over and over across many messages (Sale graphs).

We could reap obvious performance benefits if we could somehow cache the serializations of these objects. Ideally, we would like to tell the RuntimeModel that certain types - in this case, Store - should be handled through an extensibilty point, much like surrogates, which would however we capable of providing the raw serialization bytes.

One of our constrains is that the generated messages should still be protobuf-net compatible, in order to be parsable directly by clients in other platforms (say, Python) without these "hooks" (screw performance optimizations for Python clients!).

We looked at surrogates, but it looks that whatever the surrogate produces (in our case that would be somehow a byte[] array) would (as you'd expect) still be serialized as its type (i.e. as a byte[] array) and thus not compatible with the Store object expected by Python clients.

We also looked at Extensions, and even if we somehow hacked the cached serialized Store in an extra field, we'd back to square one with Python clients.

Is there any other extensibility mechanism we might use for this scenario?

1

1 Answers

1
votes

Ooh, interesting. Indeed, the python requirement derails what I was going to say (reference tracking). What you could do is to serialize those first to a byte[] each (via MemoryStream) and then just include that data as a byte[] member (there is no difference between a byte[] and the original object - external clients won't notice any differenc), but the thought occurs that in most cases this is not going to be significant faster than just keeping the object model "as is" and serialising some of the nodes multiple times (the serialization is not very slow).

Frankly, though, I would be looking at storing it a different way - so instead of storing stores as a child node, I'd have them uniquely once as top-level nodes, and just store some unique identifier as a lookup (not the child-object itself). That changs the layout, though.

No, nothing is built in to support this, and I'm not sure it is a key-scenario for supporting.