5
votes

I have a vector<char> and I want to be able to get an unsigned integer from a range of bits within the vector. E.g.

visualisation of bitvalues

And I can't seem to be able to write the correct operations to get the desired output. My intended algorithm goes like this:

  • & the first byte with (0xff >> unused bits in byte on the left)
  • << the result left the number of output bytes * number of bits in a byte
  • | this with the final output
  • For each subsequent byte:
    • << left by the (byte width - index) * bits per byte
    • | this byte with the final output
  • | the final byte (not shifted) with the final output
  • >> the final output by the number of unused bits in the byte on the right

And here is my attempt at coding it, which does not give the correct result:

#include <vector>
#include <iostream>
#include <cstdint>
#include <bitset>

template<class byte_type = char>
class BitValues {
    private:
    std::vector<byte_type> bytes;
    public:
        static const auto bits_per_byte = 8;
        BitValues(std::vector<byte_type> bytes) : bytes(bytes) {
        }
        template<class return_type>
        return_type get_bits(int start, int end) {
            auto byte_start = (start - (start % bits_per_byte)) / bits_per_byte;
            auto byte_end = (end - (end % bits_per_byte)) / bits_per_byte;
            auto byte_width = byte_end - byte_start;
            return_type value = 0;

            unsigned char first = bytes[byte_start];
            first &= (0xff >> start % 8);
            return_type first_wide = first;
            first_wide <<= byte_width;
            value |= first_wide;

            for(auto byte_i = byte_start + 1; byte_i <= byte_end; byte_i++) {
                auto byte_offset = (byte_width - byte_i) * bits_per_byte;
                unsigned char next_thin = bytes[byte_i];
                return_type next_byte = next_thin;
                next_byte <<= byte_offset;
                value |= next_byte;
            }
            value >>= (((byte_end + 1) * bits_per_byte) - end) % bits_per_byte;

            return value;
        }
};

int main() {
    BitValues<char> bits(std::vector<char>({'\x78', '\xDA', '\x05', '\x5F', '\x8A', '\xF1', '\x0F', '\xA0'}));
    std::cout << bits.get_bits<unsigned>(15, 29) << "\n";
    return 0;
}

(In action: http://coliru.stacked-crooked.com/a/261d32875fcf2dc0)

I just can't seem to wrap my head around these bit manipulations, and I find debugging very difficult! If anyone can correct the above code, or help me in any way, it would be much appreciated!

Edit:

  • My bytes are 8 bits long
  • The integer to return could be 8,16,32 or 64 bits wside
  • The integer is stored in big endian
3

3 Answers

1
votes

You made two primary mistakes. The first is here:

first_wide <<= byte_width;

You should be shifting by a bit count, not a byte count. Corrected code is:

first_wide <<= byte_width * bits_per_byte;

The second mistake is here:

auto byte_offset = (byte_width - byte_i) * bits_per_byte;

It should be

auto byte_offset = (byte_end - byte_i) * bits_per_byte;

The value in parenthesis needs to be the number of bytes to shift right by, which is also the number of bytes byte_i is away from the end. The value byte_width - byte_i has no semantic meaning (one is a delta, the other is an index)

The rest of the code is fine. Though, this algorithm has two issues with it.

First, when using your result type to accumulate bits, you assume you have room on the left to spare. This isn't the case if there are set bits near the right boundry and the choice of range causes the bits to be shifted out. For example, try running

bits.get_bits<uint16_t>(11, 27);

You'll get the result 42 which corresponds to the bit string 00000000 00101010 The correct result is 53290 with the bit string 11010000 00101010. Notice how the rightmost 4 bits got zeroed out. This is because you start off by overshifting your value variable, causing those four bits to be shifted out of the variable. When shifting back at the end, this results in the bits being zeroed out.

The second problem has to do with the right shift at the end. If the rightmost bit of the value variable happens to be a 1 before the right shift at the end, and the template parameter is a signed type, then the right shift that is done is an 'arithmetic' right shift, which causes bits on the right to be 1-filled, leaving you with an incorrect negative value.

Example, try running:

bits.get_bits<int16_t>(5, 21);

The expected result should be 6976 with the bit string 00011011 01000000, but the current implementation returns -1216 with the bit string 11111011 01000000.

I've put my implementation of this below which builds the bit string from the right to the left, placing bits in their correct positions to start with so that the above two problems are avoided:

template<class ReturnType>
ReturnType get_bits(int start, int end) {
  int max_bits = kBitsPerByte * sizeof(ReturnType);
  if (end - start > max_bits) {
    start = end - max_bits;
  }

  int inclusive_end = end - 1;
  int byte_start = start / kBitsPerByte;
  int byte_end = inclusive_end / kBitsPerByte;

  // Put in the partial-byte on the right
  uint8_t first = bytes_[byte_end];
  int bit_offset = (inclusive_end % kBitsPerByte);
  first >>= 7 - bit_offset;
  bit_offset += 1;
  ReturnType ret = 0 | first;

  // Add the rest of the bytes
  for (int i = byte_end - 1; i >= byte_start; i--) {
    ReturnType tmp = (uint8_t) bytes_[i];
    tmp <<= bit_offset;
    ret |= tmp;
    bit_offset += kBitsPerByte;
  }

  // Mask out the partial byte on the left
  int shift_amt = (end - start);
  if (shift_amt < max_bits) {
    ReturnType mask = (1 << shift_amt) - 1;
    ret &= mask;
  }
}
0
votes

There is one thing you certainly missed I think: the way you index the bits in the vector is different from what you have been given in the problem. I.e. with algorithm you outlined, the order of the bits will be like 7 6 5 4 3 2 1 0 | 15 14 13 12 11 10 9 8 | 23 22 21 .... Frankly, I didn't read through your whole algorithm, but this one was missed in the very first step.

0
votes

Interesting problem. I've done similar, for some systems work.

  • Your char is 8 bits wide? Or 16? How big is your integer? 32 or 64?
  • Ignore the vector complexity for a minute.
  • Think about it as just an array of bits.
  • How many bits do you have? You have 8*number of chars
  • You need to calculate a starting char, number of bits to extract, ending char, number of bits there, and number of chars in the middle.
  • You will need bitwise-and & for the first partial char
  • you will need bitwise-and & for the last partial char
  • you will need left-shift << (or right-shift >>), depending upon which order you start from
  • what is the endian-ness of your Integer?

At some point you will calculate an index into your array that is bitindex/char_bit_width, you gave the value 171 as your bitindex, and 8 as your char_bit_width, so you will end up with these useful values calculated:

  • 171/8 = 23 //location of first byte
  • 171%8 = 3 //bits in first char/byte
  • 8 - 171%8 = 5 //bits in last char/byte
  • sizeof(integer) = 4
  • sizeof(integer) + ( (171%8)>0?1:0 ) // how many array positions to examine

Some assembly required...