12
votes

I have a list of about 500 million items. I am able to serialize this into a file with protobuf-net file if I serialize individual items, not a list -- I cannot collect the items into List of Price and then serialize because I run out of memory. So, I have to serialize one record at a time:

using (var input = File.OpenText("..."))
using (var output = new FileStream("...", FileMode.Create, FileAccess.Write))
{
    string line = "";
    while ((line = input.ReadLine()) != null)
    {
        Price price = new Price();
        (code that parses input into a Price record)

        Serializer.Serialize(output, price);
    }
}

My question is about deserialization part. It appears that Deserialize method does not move the Position of the stream to the next record. I tried:

using (var input = new FileStream("...", FileMode.Open, FileAccess.Read))
{
    Price price = null;
    while ((price = Serializer.Deserialize<Price>(input)) != null)
    {
    }
}

I see one real-looking Price record, and then the rest are empty records -- I get the Price object back but all fields are initialized to default values.

How to properly deserialize a stream that contains a list of objects which are not serialized as a list?

3
Did you get this working? Do you need a more complete example?Marc Gravell

3 Answers

6
votes

Good news! The protobuf-net API is setup for exactly this scenario. You should see a SerializeItems and DeserializeItems pair of methods that work with IEnumerable<T>, allowing streaming both in and out. The easiest way to do feed it an enumerate is via an "iterator block" over the source data.

If, for whatever reason, that isn't convenient, that is 100% identical to using SerializeWithLengthPrefix and DeserializeWithLengthPrefix on a per-item basis, specifying (as parameters) field: 1 and prefix-style: base-128. You could even use SerializeWithLengthPrefix for the writing, and DeserializeItems for the reading (as long as you use field 1 and base-128).

Re the example - id have to see that in a fully reproducible scenario to comment; actually, what I would expect there is that you only get a single object back out, containing the combined values from each object - because without the length-prefix, the protobuf spec assumes you are just concatenating values to a single object. The two approaches mentioned above avoid this issue.

4
votes

May be I am too late on this... but just to add to what Marc already said.

As you use Serializer.Serialize(output, price); protobuf treat consecutive messages as part of a (same)single object. So when you use Deserialize using

while ((price = Serializer.Deserialize<Price>(input)) != null)

you will get all the records back. Hence you will see only the last Price record.

To do what you want to do, change the serialization code to:

Serializer.SerializeWithLengthPrefix(output, price, PrefixStyle.Base128, 1);

and

while ((price = Serializer.DeserializeWithLengthPrefix<Price>(input, PrefixStyle.Base128, 1)) != null)
2
votes

The API apprently has changed since Marc's answer.
It seems there's no SerializeItems method any more.

Here's some more up to date info that should help:

ProtoBuf.Serializer.Serialize(stream, items);

can take an IEnumerable as seen above and it does the job when it comes to serialization.
However there's a DeserializeItems(...) method and the devil is in the details :)
If you serialize IEnumerable like above, then you need to call DeserializeItems passing PrefixStyle.Base128 and 1 as fieldNumber cause apprently those are the defaults.
Here's an example:

ProtoBuf.Serializer.DeserializeItems<T>(stream, ProtoBuf.PrefixStyle.Base128, 1));

Also as pointed out by Marc and Vic you can serialize/deserialize on a per item basis like this (using custom values for PrefixStyle and fieldNumber):

ProtoBuf.Serializer.SerializeWithLengthPrefix(stream, item, ProtoBuf.PrefixStyle.Base128, fieldNumber: 1);

and

T item;
while ((item = ProtoBuf.Serializer.DeserializeWithLengthPrefix<T>(stream, ProtoBuf.PrefixStyle.Base128, fieldNumber: 1)) != null)
{
    // do stuff here
}