0
votes

I have a process that listens to an UDP multi-cast broadcast and reads in the data as a unsigned char*.

I have a specification that indicates fields within this unsigned char*.

Fields are defined in the specification with a type and size.

Types are: uInt32, uInt64, unsigned int, and single byte string.

For the single byte string I can merely access the offset of the field in the unsigned char* and cast to a char, such as:

char character = (char)(data[1]);

Single byte uint32 i've been doing the following, which also seems to work:

uint32_t integer =  (uint32_t)(data[20]);

However, for multiple byte conversions I seem to be stuck.

How would I convert several bytes in a row (substring of data) to its corresponding datatype?

Also, is it safe to wrap data in a string (for use of substring functionality)? I am worried about losing information, since I'd have to cast unsigned char* to char*, like:

std::string wrapper((char*)(data),length); //Is this safe?

I tried something like this:

std::string wrapper((char*)(data),length); //Is this safe?
uint32_t integer = (uint32_t)(wrapper.substr(20,4).c_str()); //4 byte int

But it doesn't work.

Thoughts?


Update

I've tried the suggest bit shift:

void function(const unsigned char* data, size_t data_len)
{
    //From specifiction: Field type: uInt32 Byte Length: 4
    //All integer fields are big endian.
    uint32_t integer = (data[0] << 24) | (data[1] << 16) | (data[2] << 8) | (data[3]);
}

This sadly gives me garbage (same number for every call --from a callback).

5
instead of wrapping in std::string you may find your data is easier to manipulate as std::vector<unsigned char> ... alternately if you want to wrap in a string std::basic_string<unsigned char> would probably be preferable.AJG85

5 Answers

2
votes

I think you should be very explicit, and not just do "clever" tricks with casts and pointers. Instead, write a function like this:

uint32_t read_uint32_t(unsigned char **data)
{
  const unsigned char *get = *data;
  *data += 4;
  return (get[0] << 24) | (get[1] << 16) | (get[2] << 8) | get[3];
}

This extracts a single uint32_t value from a buffer of unsigned char, and increases the buffer pointer to point at the next byte of data in the buffer.

This assumes big-endian data, you need to have a well-defined idea of the buffer's endian-mode in order to interpret it.

2
votes

Depends on the byte ordering of the protocol, for big-endian or so called network byte order do:

uint32_t i = data[0] << 24 | data[1] << 16 | data[2] << 8 | data[3];
0
votes

Without commenting on whether it's a good idea or not, the reason why it doesn't work for you is that the result of wrapper.substring(20,4).c_str() is (uint32_t *), not (uint32_t). So if you do:

uint32_t * integer = (uint32_t *)(wrapper.substr(20,4).c_str(); it should work.

0
votes
uint32_t integer = ntohl(*reinterpret_cast<const uint32_t*>(data + 20));

or (handles alignment issues):

uint32_t integer;
memcpy(&integer, data+20, sizeof integer);
integer = ntohl(integer);
0
votes

The pointer way:

uint32_t n = *(uint32_t*)&data[20];

You will run into problems on different endian architectures though. The solution with bit shifts is better and consistent.

std::string wrapper((char*)(data),length); //Is this safe?

This should be safe since you specified the length of the data. On the other hand if you did this:

std::string wrapper((char*)data);

The string length would be determined wherever the first 0 byte occurs, and you will more than likely chop off some data.