0
votes

Say I have a bitarray like this

101110101010101010101111111011001100100100001011001111101000101001

The two operations I want to do are:

  1. Read a contiguous subset of the bits into a single bitarray (or integer in the case of JavaScript)
  2. Read a noncontigous subset of bits into a single integer in JavaScript.

So for (1), let's say I want this:

1011[101010101010101011111]11011001100100100001011001111101000101001
== 101010101010101011111
= 1398111 in decimal

Bits 4-25 or so.

For (2), I would like to most optimally select a non-contiguous subset of bits and combine them optimally into a final value.

1011[101]0101010101010[11]1111[1]011001100100100001011001111101000101001
== 101 ++ 11 ++ 1
= 101111
= 47 in decimal

Bits 4-6, 21-22, and 27 or so.

What is the right/optimal way of doing this?

2
Does that work for bits instead of bytes? - Lance Pollard
What format are the bits stored in? Packed in an Uint8Array? Uint32Array? Plain JS-array with 32-bit "integers"? BigInt? - harold
Good question. The bits would be stored in either a Uint8Array or 32Array, probably Uint32Array. But that would just be a "view" on top of the plain bitarray/binaryarray (though I don't know how to construct this other than getting back a binaryarray in http request). - Lance Pollard
OK and which way around are the bits packed? For example, which bits are stored in the 0th entry of the Uint32Array (0th to 31st? or the most significant bits?), and which "way around" are the bits (is the least significant bit of the 0th entry also the 0th bit of the string)? Should the index work as in your example, which seems reversed? - harold

2 Answers

0
votes

The format is still a bit vague, but here's a way to do something like this. I'm making some assumptions that make the problem easier, namely:

  • At most 32 bits are extracted at once (so it fits in a Number without weird hacks)
  • Bits are in an Uint32Array (or compatible storage, as long as it has 32 bits per element)
  • The least significant bit of the 0th entry of the array is number 0 overall
  • The bit string represented this way is ... + tobits(array[1]) + tobits(array[0]) for example [ 0, 256 ] represents 00000000000000000000000100000000_00000000000000000000000000000000 (the underscore indicates the boundary between the pieces). Maybe that's the wrong way around, it can be changed but this way is simple.

The ith bit is in the i >> 5-th (aka i / 32 but with integer division) word, at offset i & 31 (aka i % 32) within that word. That's what makes this order simple to work with.

By the first assumption, at most 2 entries/words in the array are spanned by the range, so there are only two cases:

  • The bottom of the range is in one word and the top is in the next.
  • The range is wholly contained in a single word. Touching a second word should be avoided, as it might be beyond the end of the array. Also even if the second word can be touched, it wouldn't be as easy to deal with the excess bits because shift counts are taken modulo 32 so high << 32 wouldn't do the trick.

In code (not tested)

function extractRange(bits, begin, end) {
    // extract bits [begin .. end],
    // begin==end extracts a single bit
    var beginIdx = begin >> 5;
    var endIdx = end >> 5;
    var beginOfs = begin & 31;
    var endOfs = end & 31;
    var len = end - begin + 1;
    var msk = -1 >>> (32 - len);
    console.assert(len > 0 && len <= 32);
    if (beginIdx == endIdx) {
        // begin and end are in the same word
        // discard the bits before the begin of the range and mask off the high bits
        return ((bits[endIdx] >>> beginOfs) & msk) >>> 0;
    }
    else {
        var low = bits[beginIdx];
        var high = bits[endIdx];
        // the high word contains the high bits of the result, in its lowest bits
        // the low word contains the low bits of the result, in its highest bits:
        // xxxxhhhh_llllxxxx
        // high word must be shifted left by the number of bits in the low word
        var bitsInLow = 32 - beginOfs;
        return (((low >>> beginOfs) | (high << bitsInLow)) & msk) >>> 0;
    }
}

Examples:

[0xdeadbeef, 0xcafebabe] means that the string is really 0xcafebabedeadbeef (in bits)
extractRange([0xdeadbeef, 0xcafebabe], 0, 31).toString(16) =
    deadbeef
extractRange([0xdeadbeef, 0xcafebabe], 4, 35).toString(16) =
    edeadbee
extractRange([0xdeadbeef, 0xcafebabe], 8, 39).toString(16) =
    bedeadbe
extractRange([0xdeadbeef, 0xcafebabe], 60, 63).toString(16) =
    c
extractRange([0xdeadbeef, 0xcafebabe], 30, 33).toString(16) =
    b // ...ed... in binary 11101101, taking the middle 4 bits, 1011 = b

For non-contiguous ranges, you could extract every individual range and then append them. I don't think there is a nicer way in general.

0
votes

A streamlined version, that uses only generators up to the final step, thus avoiding loading whole input into memory.

// for single number
function* gUintToBits(input, dim) {
  let i = 0;
  while (i < dim) {
    yield (input >> (dim - 1 - i++)) & 1;
    // or like this, if bits should bo from left to right:
    // yield (input >> i++) & 1;
  }
}

// for array of numbers
function* gUintArrayToBits(input, dim) {
  for (let item of input) {
    yield* gUintToBits(item, dim);
  }
}

// apply intervals mask directly to generator
function* genWithIntervalsApplied(iterOfBits, intervals) {
  // fast, if number of intervals is not so big
  const isInsideIntervals = (n, itrvs) => {
    for (let intv of itrvs) {
      if (n >= intv[0] && n < intv[1]) return true;
    }
    return false;
  };

  let index = 0;
  for (let item of iterOfBits) {
    if (isInsideIntervals(index++, intervals)) {
      yield item;
    }
  }
}

// Finally consume the generator
function extractIntervalsFromUint8Array(input, intervals) {
  let result = ''
  for (let x of genWithIntervalsApplied(gUintArrayToBits(input, 8), intervals)) {
    result += `${x}`
  }
  return result
}

const result = extractIntervalsFromUint8Array(
  [1, 3, 9, 127],
  [[8, 16], [24, 32]],
);
const dec = parseInt(result, 2);

console.log(result);
console.log(dec);

output:

// 0000001101111111
// 895