1
votes

I created a binary file using a c++ program using protocol buffers. I had issues reading the binary file in my C# program, so I decided to write a small c++ program to test the reading.

My proto file is as follows

message TradeMessage {
required double timestamp = 1;
required string ric_code = 2;
required double price = 3;
required int64 size = 4;
required int64 AccumulatedVolume = 5;
} 

When writing to protocol buffer, I first write the object type, then object length and the object itself.

coded_output->WriteLittleEndian32((int) ObjectType_Trade); 
coded_output->WriteLittleEndian32(trade.ByteSize()); 
trade.SerializeToCodedStream(coded_output);

Now, when I am trying to read the same file in my c++ program i see strange behavior.

My reading code is as follows:

coded_input->ReadLittleEndian32(&objtype);
coded_input->ReadLittleEndian32(&objlen);
tMsg.ParseFromCodedStream(coded_input);
cout << "Expected Size = " << objlen << endl;
cout<<" Trade message received for: "<< tMsg.ric_code() << endl;
cout << "TradeMessage Size = " << tMsg.ByteSize() << endl;

In this case, i get following output

Expected Size = 33
Trade message received for: .CSAP0104
TradeMessage Size = 42

When I write to file, I write trade.ByteSize() as 33 bytes, but when I read the same object, the object ByteSize() is 42 bytes, which affects the rest of the data. I am not sure what is wrong in this. Please advice.

Regards, Alok

2
just to double check.. I compared the protocol buffer generated files in my reader and writer projects. The generated files are identical. So I guess, the file coding is different for some reason. I do not understand why it is different. - Alok
Any chance you can share your test values, just so I can check which is "more right" ? - Marc Gravell
also - minor point; if you wrote ([objtype] << 3) | 2 (as varint), then the byte-size (as varint), then the raw data, it would actually be a valid protobuf sequence itself, which can be useful - just sayin' - Marc Gravell
@MarcGravell can I mail you the binary data file? I am afraid there is no other way i can send you the file from here. I will make the changes regarding objtype (and some other fields) as you suggested. - Alok
@MarcGravell, dropped you an email just now. - Alok

2 Answers

1
votes

This is guesswork, based on the above: when you use ParseFromCodedStream, you aren't actually limiting that to the objlen that you previously found; thus, if the stream contains any more data than this (i.e. that isn't the end of the file), the engine will try to keep reading to the EOF. You must cap the length to your expectation. I am not a C++ expert, so I can't offer direct guidance, but if this was C# (using protobuf-net):

objType = ProtoReader.DirectReadLittleEndianInt32(file);
len = ProtoReader.DirectReadLittleEndianInt32(file);

// assume GetObjectType returns typeof(TradeMessage) for our objType
Type type = GetObjectType(objType);
msg = RuntimeTypeModel.Default.Deserialize(file, null, type, len, null);
1
votes

So apparently, i was doing a very silly mistake while creating the binary files. I did not open the file in binary mode when i wrote protobuf data to it causing it to add weird ascii characters in the middle. This caused an issue while reading the data using protobuf-net library. The issue is resolved here. Shouldn't have taken so long to resolve this.