I have a 4x4x256 tensor and a 128x256 matrix. I need to multiply each 256-d depth-wise vector of the tensor by the matrix, such that I get a 4x4x128 tensor as a result.
Working in Numpy it's not clear to me how to do this. In their current shape it doesn't look like any variant of np.dot exists to do this. Manipulating the shapes to take advantage of broadcasting rules doesn't seem to provide any help. np.tensordot and np.einsum may be useful but looking at the documentation is going right over my head.
Is there an efficient way to do this?