2
votes

I'm using protobuf to serialize large objects to binary files to be deserialized and used again at a later date. However, I'm having issues when I'm deserializing some of the larger files. The files are roughly ~2.3 GB in size and when I try to deserialize them I get several exceptions thrown (in the following order):

I've looked at the question referenced in the second exception, but that doesn't seem to cover the problem I'm having.

I'm using Microsoft's HPC pack to generate these files (they take a while) so the serialization looks like this:

   using (var consoleStream = Console.OpenStandardOutput())
   {
            Serializer.Serialize(consoleStream, dto);
   }

And I'm reading the files in as follows:

    private static T Deserialize<T>(string file)
    {
        using (var fs = File.OpenRead(file))
        {
            return Serializer.Deserialize<T>(fs);
        }
    }

The files are two different types. One is about 1GB in size, the other about 2.3GB. The smaller files all work, the larger files do not. Any ideas what could be going wrong here? I realise I've not given a lot of detail, can give more as requested.

1
Deserialization and 2.3 GB already sounds wrong. Disregards errors, the idea of using any kind of serialization for such huge amount of data is bad. Could you elaborate what exactly problem you are trying to solve by using serialization? - Sinatr
@Sinatr Yeah, I've kinf of realised that perhaps this wasn't the best route, but I have the files now so trying to salvage them. I need to be able to generate these files and save them to disk for use later. - geekchic
What use? Could you tell exactly what are these files? Maybe you decide to transfer (export/import?) data by using serialization or something else, where serialization (for such amount of data) is a bad idea. Consider to use custom file format, there huge data (HPC pack? what is that?) is just copied 1 to 1, while small part (containing configuration, paths, parameters, etc) is serialized in classic way and then combined with huge data. - Sinatr
@geekchic I have to confess, my unit test suite doesn't extend to multi-GB files. It is possible that this is simply a reader issue relating to an int that perhaps should be a long; I will have to find a moment to investigate. - Marc Gravell♦
@MarcGravell but you have to admit: It's a cool bug! And a case for the checked-arithmetic compiler option, maybe. - usr

1 Answers

1
votes

Here I need to refer to a recent discussion on the protobuf list:

Protobuf uses int to represent sizes so the largest size it can possibly support is <2G. We don't have any plan to change int to size_t in the code. Users should avoid using overly large messages.

I'm guessing that the cause of the failure inside protobuf-net is basically the same. I can probably change protobuf-net to support larger files, but I have to advise that this is not recommended, because it looks like no other implementation is going to work well with such huge data.

The fix is probably just a case of changing a lot of int to long in the reader/writer layer. But: what is the layout of your data? If there is an outer object that is basically a list of the actual objects, there is probably a sneaky way of doing this using an incremental reader (basically, spoofing the repeated support directly).