Get Integer From Bits Inside `std::vector<char>`

Question

I have a vector<char> and I want to be able to get an unsigned integer from a range of bits within the vector. E.g.

visualisation of bitvalues

And I can't seem to be able to write the correct operations to get the desired output. My intended algorithm goes like this:

& the first byte with (0xff >> unused bits in byte on the left)
<< the result left the number of output bytes * number of bits in a byte
| this with the final output
For each subsequent byte:
- << left by the (byte width - index) * bits per byte
- | this byte with the final output
| the final byte (not shifted) with the final output
>> the final output by the number of unused bits in the byte on the right

And here is my attempt at coding it, which does not give the correct result:

#include <vector>
#include <iostream>
#include <cstdint>
#include <bitset>

template<class byte_type = char>
class BitValues {
    private:
    std::vector<byte_type> bytes;
    public:
        static const auto bits_per_byte = 8;
        BitValues(std::vector<byte_type> bytes) : bytes(bytes) {
        }
        template<class return_type>
        return_type get_bits(int start, int end) {
            auto byte_start = (start - (start % bits_per_byte)) / bits_per_byte;
            auto byte_end = (end - (end % bits_per_byte)) / bits_per_byte;
            auto byte_width = byte_end - byte_start;
            return_type value = 0;

            unsigned char first = bytes[byte_start];
            first &= (0xff >> start % 8);
            return_type first_wide = first;
            first_wide <<= byte_width;
            value |= first_wide;

            for(auto byte_i = byte_start + 1; byte_i <= byte_end; byte_i++) {
                auto byte_offset = (byte_width - byte_i) * bits_per_byte;
                unsigned char next_thin = bytes[byte_i];
                return_type next_byte = next_thin;
                next_byte <<= byte_offset;
                value |= next_byte;
            }
            value >>= (((byte_end + 1) * bits_per_byte) - end) % bits_per_byte;

            return value;
        }
};

int main() {
    BitValues<char> bits(std::vector<char>({'\x78', '\xDA', '\x05', '\x5F', '\x8A', '\xF1', '\x0F', '\xA0'}));
    std::cout << bits.get_bits<unsigned>(15, 29) << "\n";
    return 0;
}

(In action: http://coliru.stacked-crooked.com/a/261d32875fcf2dc0)

I just can't seem to wrap my head around these bit manipulations, and I find debugging very difficult! If anyone can correct the above code, or help me in any way, it would be much appreciated!

Edit:

My bytes are 8 bits long
The integer to return could be 8,16,32 or 64 bits wside
The integer is stored in big endian

Cookyt Cookyt · Accepted Answer · 2013-10-04T04:22:07

You made two primary mistakes. The first is here:

first_wide <<= byte_width;

You should be shifting by a bit count, not a byte count. Corrected code is:

first_wide <<= byte_width * bits_per_byte;

The second mistake is here:

auto byte_offset = (byte_width - byte_i) * bits_per_byte;

It should be

auto byte_offset = (byte_end - byte_i) * bits_per_byte;

The value in parenthesis needs to be the number of bytes to shift right by, which is also the number of bytes byte_i is away from the end. The value byte_width - byte_i has no semantic meaning (one is a delta, the other is an index)

The rest of the code is fine. Though, this algorithm has two issues with it.

First, when using your result type to accumulate bits, you assume you have room on the left to spare. This isn't the case if there are set bits near the right boundry and the choice of range causes the bits to be shifted out. For example, try running

bits.get_bits<uint16_t>(11, 27);

You'll get the result 42 which corresponds to the bit string 00000000 00101010 The correct result is 53290 with the bit string 11010000 00101010. Notice how the rightmost 4 bits got zeroed out. This is because you start off by overshifting your value variable, causing those four bits to be shifted out of the variable. When shifting back at the end, this results in the bits being zeroed out.

The second problem has to do with the right shift at the end. If the rightmost bit of the value variable happens to be a 1 before the right shift at the end, and the template parameter is a signed type, then the right shift that is done is an 'arithmetic' right shift, which causes bits on the right to be 1-filled, leaving you with an incorrect negative value.

Example, try running:

bits.get_bits<int16_t>(5, 21);

The expected result should be 6976 with the bit string 00011011 01000000, but the current implementation returns -1216 with the bit string 11111011 01000000.

I've put my implementation of this below which builds the bit string from the right to the left, placing bits in their correct positions to start with so that the above two problems are avoided:

template<class ReturnType>
ReturnType get_bits(int start, int end) {
  int max_bits = kBitsPerByte * sizeof(ReturnType);
  if (end - start > max_bits) {
    start = end - max_bits;
  }

  int inclusive_end = end - 1;
  int byte_start = start / kBitsPerByte;
  int byte_end = inclusive_end / kBitsPerByte;

  // Put in the partial-byte on the right
  uint8_t first = bytes_[byte_end];
  int bit_offset = (inclusive_end % kBitsPerByte);
  first >>= 7 - bit_offset;
  bit_offset += 1;
  ReturnType ret = 0 | first;

  // Add the rest of the bytes
  for (int i = byte_end - 1; i >= byte_start; i--) {
    ReturnType tmp = (uint8_t) bytes_[i];
    tmp <<= bit_offset;
    ret |= tmp;
    bit_offset += kBitsPerByte;
  }

  // Mask out the partial byte on the left
  int shift_amt = (end - start);
  if (shift_amt < max_bits) {
    ReturnType mask = (1 << shift_amt) - 1;
    ret &= mask;
  }
}

Get Integer From Bits Inside `std::vector<char>`

3 Answers