19
votes
>>> x = numpy.array([[1, 2],
...                  [3, 4],
...                  [5, 6]])
>>> [1, 7] in x
True
>>> [1, 2] in x
True
>>> [1, 6] in x
True
>>> [2, 6] in x
True
>>> [3, 6] in x
True
>>> [2, 3] in x
False
>>> [2, 1] in x
False
>>> [1, 2, 3] in x
False
>>> [1, 3, 5] in x
False

I have no idea how __contains__ works for ndarrays. I couldn't find the relevant documentation when I looked for it. How does it work? And is it documented anywhere?

3
Look at the source, then. - Marcin
@Marcin: The source is buried somewhere in a pile of C that I don't understand the structure to. A big part of it is even autogenerated, and a lot of it is duplicated to handle different dtypes and other differences. I'm not going to dig through all that if I don't have to. - user2357112 supports Monica
@AlokSinghal: Further experimentation seems to agree with that post. [1, object()] in x and [object(), 4] in x report True, but [2, object()] in x and [object(), 5] in x report False, and iterating over itertools.product(xrange(1, 7), repeat=2) and checking containment for all pairs gives the expected results. I was really hoping for something better than a mailing list archive, but if that's all there is, I'll take it. - user2357112 supports Monica
@user2357112 I just posted this as an answer since that's the correct answer and hopefully it will help other people who discover the same issue. - Alok Singhal

3 Answers

11
votes

I found the source for ndarray.__contains__, in numpy/core/src/multiarray/sequence.c. As a comment in the source states,

thing in x

is equivalent to

(x == thing).any()

for an ndarray x, regardless of the dimensions of x and thing. This only makes sense when thing is a scalar; the results of broadcasting when thing isn't a scalar cause the weird results I observed, as well as oddities like array([1, 2, 3]) in array(1) that I didn't think to try. The exact source is

static int
array_contains(PyArrayObject *self, PyObject *el)
{
    /* equivalent to (self == el).any() */

    int ret;
    PyObject *res, *any;

    res = PyArray_EnsureAnyArray(PyObject_RichCompare((PyObject *)self,
                                                      el, Py_EQ));
    if (res == NULL) {
        return -1;
    }
    any = PyArray_Any((PyArrayObject *)res, NPY_MAXDIMS, NULL);
    Py_DECREF(res);
    ret = PyObject_IsTrue(any);
    Py_DECREF(any);
    return ret;
}
6
votes

Seems like numpy's __contains__ is doing something like this for a 2-d case:

def __contains__(self, item):
    for row in self:
        if any(item_value == row_value for item_value, row_value in zip(item, row)):
            return True
    return False

[1,7] works because the 0th element of the first row matches the 0th element of [1,7]. Same with [1,2] etc. With [2,6], the 6 matches the 6 in the last row. With [2,3], none of the elements match a row at the same index. [1, 2, 3] is trivial since the shapes don't match.

See this for more, and also this ticket.

-1
votes

how to check is a 1 dimensional np.ndarray is equivalent to a row in a 2 dimensional np.ndarray

As pointed out already,

[1, 2] in x is equivalent to ([1, 2] == x).any().

[1,2,3] in x nowadays throws a DeprecationWarning, as it is 3 elements long, while x.shape[1] is only 2.

If you just want to find out if an np.ndarray is just contained (in the human interpreted way) in an other np.ndarray, use this

>>> x = np.array([[1, 2], [3, 4], [5, 6]])
>>> np.any([np.array_equal([1, 7], el) for el in list(x)])
False
>>> np.any([np.array_equal([1, 2], el) for el in list(x)])
True