7
votes

I currently interface to a server that provides protocol buffers. I can potentially receive a very large number of messages. Currently my process to read the protocol buffers and convert them to a Pandas DataFrame (not a necessary step in general, but Pandas offers nice tools for analyzing datasets) is:

  1. Read protocol buffer, it will be a google protobuf object
  2. Convert protocol buffers to dictionary using protobuf_to_dict
  3. use pandas.DataFrame.from_records to get a DataFrame

This works great, but, given the large number of messages I read from the protobuf, it is quite inefficient to convert to dictionary and then to pandas. My question is: is it possible to make a class that can make a python protobuf object look like a dictionary? That is, remove step 2. Any references or pseudocode would be helpful.

1
But Convert protocol buffers to dictionary makes a python protobuf object look like a dictionary ;) You rather need some pandas.DataFrame.from_protbuf but I don't know answer for this problem. - furas
I looked at the code, it definitely does not look like it's wrapping the protobuf object, but rather creates a real new dictionary. I believe @Justin is looking for something that only wraps, without copying data. - user3820547
Yes, I'd would like to make the google protobuf object look like a dictionary rather than copying the data to python dict first. - Justin

1 Answers

4
votes

You might want to check the ProtoText python package. It does provide in-place dict-like operation to access your protobuf object.

Example usage: Assume you have a python protobuf object person_obj.

import ProtoText
print person_obj['name']       # print out the person_obj.name 
person_obj['name'] = 'David'   # set the attribute 'name' to 'David'
# again set the attribute 'name' to 'David' but in batch mode
person_obj.update({'name': 'David'})
print ('name' in person_obj)  # print whether the 'name' attribute is set in person_obj 
# the 'in' operator is better than the google implementation HasField function 
# in the sense that it won't raise Exception even if the field is not defined