Comparison of streaming message implementations in protobuf

Question

What are the trade-offs, advantages and disadvantages of each of these streaming implementations where multiple messages of the same type are encoded?

Are they any different at all ? What I want achieve is to store a vector of box'es, into a protobuf.

Impl 1 :

package foo;

message Boxes
{ 
  message Box 
  { required int32 w = 1;
    required int32 h = 2;
  }

  repeated Box boxes = 1; 
}

Impl 2:

package foo;

message Box 
{ required int32 w = 1;
  required int32 h = 2;
}

message Boxes 
{ repeated Box boxes = 1; 
}

Impl 3 : Stream multiple of these messages into the same file.

package foo;

message Box 
{ required int32 w = 1;
  required int32 h = 2;
}

Bruce Martin Bruce Martin · Accepted Answer · 2013-05-10T07:26:27

Marc Gravell answer is certainly correct, but one point he missed is

option's 1 & 2 (Repeated option) will serialise / deserialise all the box's at once
option 3 (multiple messages in the file) will serialise / deserialise box by box. If using java, you can use delimited files (which will add a Var-Int length at the start of the message).

Most of the time it will not matter wether you use a Repeated or Multiple messages, but if there are millions / billions of box's, memory will be an issue for option's 1 and 2 (Repeated) and option 3 (multiple messages in the file) would be the best to choose.

So in summary:

If there millions / billions of Boxes use - Option 3 (multiple messages in the file).
Otherwise use one of the Repeated options (1/2) because it simpler and supported across all Protocol buffers versions.

Personally I would like to see a "standard" Multiple Message format

Comparison of streaming message implementations in protobuf

2 Answers