2
votes

I write BitTorrent client and right now I'm dealing with bitfield messages. The bitfield message looks like below:

 <len=0001+X><id=5><bitfield>

The problem is that the len is always the same while the actual length of message is always different. Here's my python code:

message = self.recv(4096)
print(len(message)) #prints different numbers every time
current_msg_len = struct.unpack('!I', message[:4])[0]
print(current_msg_len) #always prints the same number

I am using TCP and I know that I can get incomplete message, but after handshake and bitfield there are no any messages coming in. An example of the received message:

[0, 0, 0, 95, 5, 255, 255, 255, 255, 254, 254, 255, 239, 255, 255, 255, 255, 255, 255, 255, 255, 247, 253, 255]
1

1 Answers

3
votes

Disclaimler: I know nothing about python network APIs in general and what recv() does specifically.

TCP can be thought of as two independent, infinite streams of bytes, it's not separated into individual messages like UDP is.

You are simply reading whatever is currently available to your network layer into a buffer, that is unlikely to align with bittorrent message boundaries.

The easiest way is to just read 4 bytes, decode the length, then read that many bytes into a separate buffer, consider it the message body. If any of the reads returns fewer than the amount of needed bytes you'll have to wait until you can complete a bittorrent-message. In that case you'll either have to concat buffers or use an API that lets the socket read into a pre-allocated buffer until that buffer is filled.