1
votes

I am relatively new to Python and I have written a class that contains a numpy.ndarray (containing ordinary Python int objects) for storing some data.

I would like to implement __getitem__ and __setitem__ in a way that behaves intuitively for someone familiar with numpy.ndarray. Here is a heavily simplified version of my first attempt:

import numpy as np

class my_class:
    
    def __init__(self, data : np.ndarray, info):
        self._data = data
        self._info = info
    
    def __getitem__(self, key):
        data = self._data[key]
        return my_class(np.atleast_1d(data), self._info)
    
    def __setitem__(self, key, value):
        self._data[key] = value._data

The main issue I have is that my_class contains important information (_info) about the internal data. Therefore, unlike a np.ndarray, I can't just return an int when the index is a scalar:

x = my_object[0]  # Require: isinstance(x, my_class) == True

That's why my __getitem__ returns a my_class object. This includes a call to np.atleast_1d() to ensure the _data is always a np.ndarray. I tried np.array() instead of np.atleast_1d(), but that can apparently create 0-dimensional arrays which cannot be indexed.

Problem:

In __setitem__, if self._data[key] is an int object and not a np.ndarray (for example, because key is an int), then I end up with nested np.ndarray objects inside the array. For example, this script:

import numpy as np
from my_package import my_class

my_object = my_class(np.array([[1,2,3],[4,5,6]], dtype=object), "info")
my_object[1,2] = my_object[0,0]
my_object[0,0:2] = my_object[1,1:3]

print(my_object._data)

Produces this output:

[[5 array([1]) 3]
 [4 5 array([1])]]

Attempted solution:

I did a lot of searching and found many suggestions related to using isinstance(key, slice) to handle slices as a special case. But things seem to be more complicated for numpy.ndarray, which supports tuple and list indexes to handle multi-dimensional indexing and automatic broadcasting, etc. This SO answer touches very briefly on adding additional special handling for tuple, but not list.

The best solution I could think of was to instead treat the scalar int data as the special case:

    def __setitem__(self, key, value):
        if isinstance(self._data[key], int):
            self._data[key] = value._data[0]
        else:
            self._data[key] = value._data

Questions:

  • Is there a better way to do this?
    • Are there any glaring pitfalls in my attempted solution?
    • I considered subclassing np.ndarray (instead of just having a np.ndarray member variable), but it seemed like this would cause all np.ndarray functionality to be inherited by my_class. I want my_class to be intuitive for someone familiar with np.ndarray, but I want to control which features are implemented and how.
  • Is using isinstance() like this a "code smell"? (I feel like it goes against the idea of duck typing in Python).

Other notes:

  • I am interested in Python 3.x.
  • I am aware that using int as the data type nullifies some of the advantages of numpy (e.g. execution speed). I need arbitrary precision.