5
votes

I want to stream protobuf messages onto a file.

I have a protobuf message

message car {
     ... // some fields
}

My java code would create multiple objects of this car message.

How should I stream these messages onto a file.

As far as I know there are 2 ways of going about it.

  1. Have another message like cars

    message cars {
      repeated car c = 1;
    }
    

    and make the java code create a single cars type object and then stream it to a file.

  2. Just stream the car messages onto a single file appropriately using the writeDelimitedTo function.

I am wondering which is the more efficient way to go about streaming using protobuf.

When should I use pattern 1 and when should I be using pattern 2?

This is what I got from https://developers.google.com/protocol-buffers/docs/techniques#large-data

I am not clear on what they are trying to say.

Large Data Sets

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are really just a collection of small pieces, where each small piece may be a structured piece of data. Even though Protocol Buffers cannot handle the entire set at once, using Protocol Buffers to encode each piece greatly simplifies your problem: now all you need is to handle a set of byte strings rather than a set of structures.

Protocol Buffers do not include any built-in support for large data sets because different situations call for different solutions. Sometimes a simple list of records will do while other times you may want something more like a database. Each solution should be developed as a separate library, so that only those who need it need to pay the costs.

1
You're not supposed to look for design patterns. They are something that come about from desirable code. What you should be asking is "What is the simplest way I can code this?". If that happens to be a Design Pattern (or close), then use it. Otherwise, don't.christopher
@chris Thanx for your response I can code this both ways (as mentioned in the question). Was wondering which was the more efficient way to go about it. Efficiency in terms of serializing or deserializing time, size of the streamed object.Varun Tulsian
I didn't respond, I just made some minor formatting changes to your question.Taylan Aydinli

1 Answers

1
votes

Have a look at Previous Question. Any difference in size and time will be minimal (option 1 faster ??, option 2 smaller).

My advice would be:

  1. Option 2 for big files. You process message by message.
  2. Option 1 if multiple languages are need. In the past, delimited was not supported in all languages, this seems to be changing though.
  3. Other wise personel preferrence.