How to speed up matrix multiplication in numpy if I only need the belt around the diagonal?

Question

I need to compute a second power of a square matrix A (A*A^T), but I am only interested in the values around the diagonal of the result. In other words, I need to compute dot products of neighboring rows, where the neighborhood is defined by some window of fixed size and ideally, I want to avoid computation of the remaining dot products. How to do this in numpy without running the full matrix multiplication with some masking? The resulting array should look as follows:

a1*a1  a1*a2  0      0      0      0
a2*a1  a2*a2  a2*a3  0      0      0
0      a3*a2  a3*a3  a3*a4  0      0
0      0      a4*a3  a4*a4  a4*a5  0...
0      0      0      ...
...

The example matrix contains dot products for neighboring rows. Each row is multiplied only with its left and right neighbour. The zeros should ideally not be computed by the solution in order to save time. This thread seems to be headed in similar direction.

Dorian Dorian · Accepted Answer · 2020-06-24T14:56:43

Have a look into sparse matrices with scipy (where numpy is also from).

For your specific problem:

The diagonal elements are the column-wise sum of the elementwise product of your matrix and its transpose v = np.sum(np.multiply(A, A.T), axis=0)
the off diagonal elements are the same, just with the last row/column deleted and substituted by a zero column/row at the first index:

pos_offset = np.concatenate((np.zeros((n, 1)), A[:, :-1]), axis=1)
v_pos = np.sum(np.multiply(A, pos_offset.T), axis=0)
# similar for the negative offset diagonal

you can then construct the diagonal matrix with np.diag(v):

A_res = np.diag(v) + np.diag(v_pos, k=1) + np.diag(v_neg, k=-1)

How to speed up matrix multiplication in numpy if I only need the belt around the diagonal?

1 Answers