Advanced broadcasting in TensorFlow (or Numpy)

Question

In TensorFlow I have a rank-2 tensor M (a matrix) of shape [D, D] and a rank-3 tensor T of shape [D, D, D].

I need to combine them to form a new matrix R as follows: the element R[a, b+c-a] is given by the sum of all the elements T[a, b, c]*M[b, c] where b+c-a is constant (where b+c-a has to be between 0 and D-1).

An inefficient way to create R is with nested for loops over the indices and a check that b+c-a does not exceed D-1 (e.g. in numpy):

R = np.zeros([D,D])

for a in range(D):
  for b in range(D):
    for c in range(D):
      if 0 <= b+c-a < D:
        R[a, b+c-a] += T[a, b, c]*M[b, c]

but I would like to use broadcasting and/or other more efficient methods.

How can I achieve this?

Show the for loop code, it's not possible to understand what you want from this explanation. — Daniel F
But b + c - a may take the same value for more than one combination of a, b and c, even if a is fixed (e.g. for (a=2, b=1, c=3) and (a=2, b=3, c=1) the result is 2). Which of the values should R[a, b + c - a] take? Or should it be a combination (addition, product)? — jdehesa

jdehesa jdehesa · Accepted Answer · 2019-06-24T12:37:57

You can vectorize that calculation as follows:

import numpy as np

np.random.seed(0)
D = 10
M = np.random.rand(D, D)
T = np.random.rand(D, D, D)
# Original calculation
R = np.zeros([D, D])
for a in range(D):
    for b in range(D):
        for c in range(D):
            if 0 <= b + c - a < D:
                R[a, b + c - a] += T[a, b, c] * M[b, c]
# Vectorized calculation
tm = T * M
a = np.arange(D)[:, np.newaxis, np.newaxis]
b, c = np.ogrid[:D, :D]
col_idx = b + c - a
m = (col_idx >= 0) & (col_idx < D)
row_idx = np.tile(a, [1, D, D])
R2 = np.zeros([D, D])
np.add.at(R2, (row_idx[m], col_idx[m]), tm[m])
# Check result
print(np.allclose(R, R2))
# True

Alternatively, you could consider using Numba to accelerate the loops:

import numpy as np
import numba as nb

@nb.njit
def calculation_nb(T, M, D):
    tm = T * M
    R = np.zeros((D, D), dtype=tm.dtype)
    for a in nb.prange(D):
      for b in range(D):
        for c in range(max(a - b, 0), min(D + a - b, D)):
          R[a, b + c - a] += tm[a, b, c]
    return R

print(np.allclose(R, calculation_nb(T, M, D)))
# True

In a couple of quick tests, even without parallelization, this is quite faster than NumPy.

Advanced broadcasting in TensorFlow (or Numpy)

1 Answers