Solved this ages ago, but this question recently got some attention so I'm adding my solution (and some clarification). See below.
I have a list of 32 bit integers, and I want to find a subset of them for which ((item ^ bits) & mask) == 0
is true. The problem is, bits
and mask
can both take any value (mask
is biased towards having not very many bits set though), so I can't simply pre-calculate all possible combinations.
The list is not very big, typically around 500 items, so things that look good in theory (such as a binary tree, where for each bit in the mask a whole subtree can be skipped) are actually slow in practice. Even a tree with just two levels was just a bit slower than the naive approach, despite skipping a significant number of tests.
Currently I iterate over the entire list and test every item. It's fast once, but it happens millions of times, every time with different bits
and mask
so caching the result doesn't help. This part of the program takes just over 40% of the total CPU time it uses.
foreach (var row in validRows.Keys)
{
// this single line here takes 40% of the total program time, according to ANTS 5
if (((oldrow ^ row) & oldused) == 0)
// the other magic takes no significant time, according to ANTS
{
if (y > 1 && ((((row ^ prev) | yminone) + 1) >> rows.Length) == 0)
{
continue;
}
if (dict.ContainsKey(row))
continue;
dict.Add(row, true);
rows[y] = row;
count(y + 1, dict); // this is a recursive call.
dict.Remove(row);
}
}
I gathered some statistics. It turns out that over 130000 of the 179000 queries return only 1 item. That sounds like an opportunity for some kind of optimization to me, but I'm not sure how or what.
For this particular sub-problem, preprocessing helped a lot. I now make an array of possibilities for every row in the input, which is just validRows
(now an array instead of a dictionary) filtered by (int row) => ((inputrow ^ row) & inputfilledmask) == 0
.
The actual problem is, given a partially filled 8x8 boolean matrix, to count all assignments that satisfy the following rules:
- 4 ones and 4 zeroes in every row
- 4 ones and 4 zeroes in every column
- No more than 2 ones next to each other vertically or horizontally
- No more than 2 zeroes next to each other vertically or horizontally
- No two rows may be equal
- No two columns may be equal
Here's how I solve it now:
For every row, filter the list of the 34 valid rows down to the ones which could be assigned on top of it (ie all the bits that correspond to ones in the mask that marks the filled cells are equal in the input row and the row that could be assigned on top of it).
Then recursively, fill in the next row with every possible row at that point. That means it must be in the filtered list, it must not have been used yet (ie not in the hash set), and a test is done to ensure that it doesn't violate the 3rd of 4th rules. To prune some more sub-trees of the recursion tree, I also use two extra integers, one to keep track of the number of zeroes in every column (one nibble per column), and one to keep track of the ones. Potential rows that would violate the second rule are ignored by a concise test. These integers are initialized corresponding to the input (plus 0x33333333
, so that 4 ones in a column don't set the highest bit of the nibble but 5 or more do), and updated only with cells that were previously empty.
Finally, at the bottom of the recursion tree, the last row is completely dictated by the two integers that count the ones and zeroes in the columns (even just one of them is enough to determine the last row). Then a test is done for duplicate columns - that's the only thing that isn't automatically guaranteed by construction. In all, the time went down from about a minute to about a tenth of a second (usually less).
I'm still open to suggestions (although that would make this a completely different question, really). This counting routine is used to generate "good" initial configurations by brute force - the faster it runs, the better the result can be in the allotted time.
mask
? – Jon Skeetbits
andmask
32-bit as well? – FlyingStreudelmask
is likely to have around zero to 5 bits set (though it can be more, it is not likely), the values never have more than 2 same bits in a row in the lower bits (ie never "000" or "1111"), but the upper bits (say the 16 upper most bits) can be all zero. – haroldbits
andmask
are 32 bits as well. The result does not have to be in a list, I just iterate over them and use them one by one. On average, 17% of the items are returned for a query. – harold