12
votes

Consider the following exercise in Numpy array broadcasting.

import numpy as np
v = np.array([[1.0, 2.0]]).T # column array

A2 = np.random.randn(2,10) # 2D array
A3 = np.random.randn(2,10,10) # 3D

v * A2 # works great

# causes error: 
v * A3 # error

I know the Numpy rules for broadcasting, and I'm familiar with bsxfun functionality in Matlab. I understand why attempting to broadcast a (2,1) array into a (2,N,N) array fails, and that I have to reshape the (2,1) array into a (2,1,1) array before this broadcasting goes through.

My question is: is there any way to tell Python to automatically pad the dimensionality of an array when it attempts to broadcast, without me having to specifically tell it the necessary dimension?

I don't want to explicitly couple the (2,1) vector with the multidimensional array it's going to be broadcast against---otherwise I could do something stupid and absurdly ugly like mult_v_A = lambda v,A: v.reshape([v.size] + [1]*(A.ndim-1)) * A. I don't know ahead of time if the "A" array will be 2D or 3D or N-D.

Matlab's bsxfun broadcasting functionality implicitly pads the dimensions as needed, so I'm hoping there's something I could do in Python.

2
My view is that this is a feature and not a bug. For example, suppose you had a 2-by-1 column vector v and then you had a 2-by-2-by-10 ndarray. Do you want to reshape v to have shape (2,1,1) or shape (1,2,1)? If you just pad the dimensions, it could be ambiguous to the user. Forcing an explicit reshape is a better general procedure, and leave it to the user to write a special function to perform the reshaping automatically if the user has a fixed convention. But it's not good to make a global numpy dimension-padder that forces a convention upon you. It would be too easy to misuse.ely
-1 @EMS. This ambiguity is readily solved by specifying that the first non-singleton dimension will be used in the broadcast. This attitude of "This way it's better" is totally inappropriate for systems used by professional programmers and applied mathematicians---this is a flaw in terms of illegibility and complectedness, not a feature.Ahmed Fasih
-1 @Ahmed Fasih. I am an applied mathematician who writes Python code for scientific applications every day, and I feel that your proposed convention of always adopting the first non-singleton dimension would be very poor. Much better for you to write a function that adopts that convention than for NumPy developers to worry about hardcoding a convention like that which benefits some users (such as you) but would not be useful for some other users (such as me).ely
Also, though the questions start out differently, this is a possible duplicate of: < stackoverflow.com/questions/15296944/… >. I think numpy.apply_along_axis and numpy.tensordot in particular seem more than sufficient for this kind of thing, while still leaving the nice property that the programmer must make some explicit reference to the way that ambiguous dimension changes should occur for broadcasting.ely
I'd expect apply_along_axis to totally destroy performance by de-vectorizing the array operation. Also, broadcasting is much more useful than just multiplication via tensordot.Ahmed Fasih

2 Answers

9
votes

It's ugly, but this will work:

(v.T * A3.T).T

If you don't give it any arguments, transposing reverses the shape tuple, so you can now rely on the broadcasting rules to do their magic. The last transpose returns everything to the right order.

9
votes

NumPy broadcasting adds additional axes on the left.

So if you arrange your arrays so the shared axes are on the right and the broadcastable axes are on the left, then you can use broadcasting with no problem:

import numpy as np
v = np.array([[1.0, 2.0]])  # shape (1, 2)

A2 = np.random.randn(10,2) # shape (10, 2)
A3 = np.random.randn(10,10,2) # shape (10, 10, 2)

v * A2  # shape (10, 2)

v * A3 # shape (10, 10, 2)