Numpy 1-dim array vs 2-dim array with one of the dimension having length 1

Question

In Matlab there is no concept of 1 dimensional array. All arrays have at least two dimensions. All "vectors" are either "row vectors" (1xn arrays) or "column vectors" (nx1 arrays).

In NumPy, on the other hand, arrays can also be one dimensional. So there is the concept of "pure vector" (dimension n), "row vector" (dimension 1xn) and "column vectors" (dimension nx1).

This is giving me headaches now that I am moving from Matlab to Python.

As an example consider the case where I have to shift the rows of an n x k (n is generally big, but k can be 1) matrix, call it A, down by one row and then add a row of zeros a the first row.

In Matlab I would just do

[n, k] = size(A);
B      = [zeros(1,k); A(1:end-1,:)];

In Numpy, I would like this to work not only on a 2-dimensional input but also on a 1-dimentional one. So a working solution would be

import numpy as np

if A.ndim == 1:
    B = np.concatenate((np.zeros(1), A[:-1]), axis=0)

if A.ndim == 2:
    (n, k) = A.shape
    B = np.concatenate((np.zeros((1,k)), A[:-1,:]), axis=0)

But this is way to heavy. Is there a better (more coincise) way?

More in general, I always have this problem: if I wrote a function that takes a 2 dimensional array (n x k), call it arr, where k can very well be 1, the function might fail on 1 dimensional arrays (for example if I do arr[0,:]). But I would like it to work also on 1 dimensional arrays, since they are morally the same thing as 2 dimensional arrays in which one of the dimensions is 1.

Sure one way out would be to put something like

if arr.ndim == 1
    arr = arr.reshape((arr.shape[0],1))

at the very beginning of the function, so that then the function is guaranteed to have a 2 dimensional array to work with.

But this is not fully satisfactory. For example, it could be the case that my function returns an array of the same shape of the input (nxk). But if the input were one dimensional, I would like it to return something one dimensional as well, not a (nx1). So to take care of this case, I would need to add other lengthy if statements and reshaping, which would make my code look even heavier and uglier.

What's the best way out?

While those dimension tests may add to the length of the code, they don't make much difference in run time. You could hide that code in a function. numpy functions make adjustments like this all the time. Look for example at np.atleast_2d, or how vstack, hstack and column_stack expand on the basic concatenate. The keep_dims parameter of functions like np.sum can be handy, as well as understanding the difference between indexing with 0, [0]` and 0:1. For some reason the transition from MATLAB to numpy wasn't that painful for me. — hpaulj
As I was reading (very weel written question, by the way) I was going to suggest your reshape-based approach, only with another reshape at the end to restore the original number of dimensions, which you would need to store initially in a variable — Luis Mendo
like it to return something one dimensional as well, not a (nx1) - .squeeze() that extra dimension out at the end of the function. The best way out might be to just start using Numpy and get used to the way it works unless you can find another suitable package to accomplish what you want. If you are asking for package/library recommendations that is off topic at SO. — wwii
@wwii "getting used to the way [Numpy] works" is exactly what I am trying to do!!! If you perceive otherwise I would love if you told me a more NumPythonic way of thinking!!! — Chicken
@LuisMendo Thank you! I would really try and avoid this kind of logic if possible. It would just create mess.. of that I am sure... — Chicken

flawr flawr · Accepted Answer · 2021-03-21T18:57:17

I perceive the "strictness" of numpy (compared to MATLAB) when it comes to array sizes as an advantage, I feel like it makes many things more predictable.

I propose following solution to your first problem, which might contain some useful tools for tackling future problems. The first tool is the "ellipsis" (...) object, which can be used in indexing. When you see these three dots in indexing, you can think of it as replacing as many : as necessary. For example if A.shape = (42,2021,69,7) then A[..., 1, :] is the same as A[:, :, 1, :]. And here I should add that it is quite clear that you can only use one of these per indexing expression. It is a very useful thing to have in order to write functions that deal with arrays of an arbitrary amount of dimensions.

The second (which is not necessary for this answer) is that you can use that for things like np.ones or np.zeros etc you always have the corresponding function np.ones_like or np.zeros_like which let's you avoid many cumbersome computations.

So in the following we use these to first create a new array that has the right shape without having to do any arithmetic. We just call np.zeros_like on the thing we'd like to replace (in this case the last "hyper-row" of A, but it could be any of them). And at the same time the ellipsis operator conveniently deals with any number of dimensions we might have present:

import numpy as np  
def f(A):
  u = np.zeros_like(A[-1:, ...]) # np.zeros(A[-1:, ...].shape)
  v = A[:-1,...]
  return np.concatenate([u,v], axis=0)

Try it online!

Numpy 1-dim array vs 2-dim array with one of the dimension having length 1

1 Answers