Dot-multiplying large dense matrices of different dtype (float x boolean)

Question

I am multiplying 2 matrices, A.dot(B), where:

A = 1 x n matrix, dtype float

B = n x n matrix, dtype Boolean

I am performing this calculation for large n, and run out of memory very quickly (around n=14000 fails). A and B are dense.

It appears the reason is that numpy converts B to dtype float before performing the matrix multiplication, hence incurring a huge memory cost. In fact, %timeit suggests it spends more time converting B to float than performing the multiplication.

Is there a way round this? Emphasis here is on reducing the memory spike / float conversion, while still allowing common matrix functionality (matrix addition / multiplication).

Here's reproducible data for benchmarking solutions:

np.random.seed(999)
n = 30000
A = np.random.random(n)
B = np.where(np.random.random((n, n)) > 0.5, True, False)

Does it make any difference if you convert B to float yourself prior to using dot i.e. does A.dot(B.astype(float) shows the same behavior? — Cleb
Memory and calculation are very close for: A.dot(B.astype(float) and A.dot(B) If I convert B to float beforehand, calculation time drops significantly. I'm aiming to perform this calculation for much larger n (at least n=30,000), so the memory improvement is critical, even on a machine with more memory. — jpp
That still leaves the question about the 64bits. And honestly: replacing multiplication + summation by just the latter is more efficient. Not sure if this can be exploited through indexing alone, but you could do it with a C-extension. — deets
When arrays get very large, a few iterations over smaller blocks can be a time saver. The savings in memory management time more than compensate for the extra time spent in iteration. — hpaulj

Paul Panzer Paul Panzer · Accepted Answer · 2018-01-14T00:18:00

You can save space and time compressing the boolean array into a bitfield using np.packbits and then np.bincount on the rows to compute blocks of 8 scalar products simultaneously.

import numpy as np

def setup_data(M, N):
    return {'B': np.random.randint(0, 2, (M, N), dtype=bool),
            'A': np.random.random((M,))}

def f_vecmat_mult(A, B, decode=np.array(np.unravel_index(np.arange(256), 8*(2,)))):
    M, N = B.shape
    out = [(decode * np.bincount(row, A, minlength=256)).sum(axis=1) for row in np.packbits(B, axis=1).T]
    if N & 7:
        out[-1] = out[-1][:N & 7]
    return np.concatenate(out)

def f_direct(A, B):
    return A @ B

import types
from timeit import timeit

for M, N in [(99, 80), (999, 777), (9999, 7777), (30000, 30000)]:
    data = setup_data(M, N)
    ref = f_vecmat_mult(**data)
    print(f'M, N = {M}, {N}')
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        try:
            assert np.allclose(ref, func(**data))
            print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(**data)', globals={'f':func, 'data':data}, number=100)*10))
        except:
            print("{:16s} apparently failed".format(name[2:]))

Sample output:

M, N = 99, 80
vecmat_mult           0.12248290 ms
direct                0.03647798 ms
M, N = 999, 777
vecmat_mult           1.67854790 ms
direct                5.68286091 ms
M, N = 9999, 7777
vecmat_mult          68.74523309 ms
direct              571.34140913 ms
M, N = 30000, 30000
vecmat_mult        1345.18991556 ms
direct           apparently failed

Dot-multiplying large dense matrices of different dtype (float x boolean)

1 Answers