Reasons to use int32 in a protobuf

Question

In the description of scalar types for gpb proto2 (https://developers.google.com/protocol-buffers/docs/proto#scalar) it says:

int32

Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.

sint32

Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.

Will the sint32 be equally efficient for positive values as the int32?

In other words, is there any reason to use int32?

If it matters what language is used, i'm only interested in C++.

Do you need variable length integers? If not, fixed length might be better. — tadman

ephemient ephemient · Accepted Answer · 2018-02-16T07:43:38

https://developers.google.com/protocol-buffers/docs/encoding#signed-integers

Signed varints are encoded by alternating between positive and negative values. For example,

   value     int32    zigzag    sint32
          (binary)            (binary)
       0  00000000         0  00000000
      -1  11111111         1  00000001
          11111111
          11111111
          11111111
          00001111
       1  00000001         2  00000010
      -2  11111110         3  00000011
          11111111
          11111111
          11111111
          00001111
...
      63  00111111       126  01111110
     -64  11000000       127  01111111
          11111111
          11111111
          11111111
          00001111
      64  01000000       128  10000000
                              00000001
...

On average, a positive number will require one more bit to be encoded as an sint than as an int.

(Expand and run the following snippet for a live demo.)

function encode_varint(number) {
  if (!number) return [0];
  var bytes = [];
  while (number) {
    var byte = number & 0x7F;
    number >>>= 7;
    if (number) byte |= 0x80;
    bytes.push(byte);
  }
  return bytes;
}
function format_bytes(bytes) {
  var output = '';
  for (var i = 0; i < bytes.length; i++) {
    if (i) output += ' ';
    output += bytes[i].toString(2).padStart(8, '0');
  }
  return output;
}
var valueElem = document.getElementById('value');
var int32Elem = document.getElementById('int32');
var sint32Elem = document.getElementById('sint32');
function update() {
  var value = parseInt(valueElem.value);
  var int32 = encode_varint(value);
  var sint32 = encode_varint(value << 1 ^ -(value < 0));
  int32Elem.value = format_bytes(int32);
  sint32Elem.value = format_bytes(sint32);
}
valueElem.addEventListener('change', update);
update();

#varint {
  display: grid;
  grid-template-columns: max-content auto;
  grid-row-gap: 1ex;
  grid-column-gap: 1ch;
}
#varint label {
  text-align: right;
}
#varint input {
  font-family: monospace;
}

<form id='varint' onsubmit='return false'>
  <label for='value'>value</label>
  <input id='value' type='number' value='0'>
  <label for='int32'>int32</label>
  <input id='int32' type='text' readonly>
  <label for='sint32'>sint32</label>
  <input id='sint32' type='text' readonly>
</form>

Reasons to use int32 in a protobuf

2 Answers