Protobuf lazy decoding of sub message

Question

I am using proto 3 (java) in my projects . I have some huge protobufs embedded with smaller messages . Is there a way I can acheive partial decoding of only few nested sub messages that I want to look at. The current issue I am having is I need to join this huge proto based record data with another records ,but my join are based on very small sub messages ,so I don't want to decode the entire huge protobuf and be able to only decode the nested message (string id) to join and then only decode the entire protobuf for the joined data.

I tried using the [lazy=true] tagging method , but I don't see any difference in generated code , also I tried benchmarking the deserialization time with and without the lazy key work and it didn't seem to affect at all . Is this feature by default on for all fields? Or is this even possible? I do see there are few classes LazyFields.java and test cases in the protobuf-github so I assume this feature has been implemented.

is the issue deserializing lots of unnecessary objects? or is it deserializing lots of unnecceary fields in necessary objects? the latter is fixable - the former is not — Marc Gravell
@ Marc Gravell : can you please elaborate a little more with an example. Although from what i understand my issue is latter case i.e be able to decode only specific nested fields/submessages instead of all the fields or in a way lazy decoding fields. — user179156
@MarcGravell : to clarifya bit more with an example : say i have few 100million giant proto objects , they all have a small nested message that is an identifier for the object . i need to filter out few proto objects that match my small list of identifiers , so for each proto object instead of deserializing the entire huge proto , which has may be be say hundreds of field/nested message , i only want to decode/deserialize the small identifier sub message . — user179156
if you create a second message that only has the fields you want (and so on and so on with nested messages), then that should do most of what you describe, especially in proto3 where most libs don't store unexpected data for round-trip — Marc Gravell
@MarcGravell I am not sure what exactly your solution is or if that even works. Why should i even create a second message ? I do need all of the content of message , i just need to decode only some fields and if that is something what i need i need to decode it further to get rest of content. Second message doesn't work and seems pretty hacky workaround — user179156

kalyanswaroop kalyanswaroop · Accepted Answer · 2020-12-11T17:34:42

For those that happen to look at this conversation later and finding it hard to understand, here's what Marc's talking about:

If your object is something like

message MyBigMessage{
  string id = 1;
  int sourceType = 2 ;
  And many other fields here, that would be expensive to parse .......

}

And you get a block of bytes that you have to parse. But you want to only parse messages from a certain source and maybe match a certain id range. You could first parse those bytes with another message as:

message MyFilterMessage{
  string id = 1; //has to be 1 to match
  int sourceType = 2 ; //has to be 1 to match
  And NOTHING ELSE here.......
}

And then, you could look at sourceType and id. If they match whatever you are filtering for, then, you could go and parse the bytes again, but this time, using MyBigMessage to parse the whole thing.

One other thing to know: FYI: As of 2017, lazy parsing was disabled in Java (except MessageSet) according to this post: https://github.com/protocolbuffers/protobuf/issues/3601#issuecomment-341516826 I dont know the current status. Too lazy to try to find out ! :-)

Protobuf lazy decoding of sub message

1 Answers