I am relatively new to Python and I have written a class that contains a numpy.ndarray
(containing ordinary Python int
objects) for storing some data.
I would like to implement __getitem__
and __setitem__
in a way that behaves intuitively for someone familiar with numpy.ndarray
. Here is a heavily simplified version of my first attempt:
import numpy as np
class my_class:
def __init__(self, data : np.ndarray, info):
self._data = data
self._info = info
def __getitem__(self, key):
data = self._data[key]
return my_class(np.atleast_1d(data), self._info)
def __setitem__(self, key, value):
self._data[key] = value._data
The main issue I have is that my_class
contains important information (_info
) about the internal data. Therefore, unlike a np.ndarray
, I can't just return an int
when the index is a scalar:
x = my_object[0] # Require: isinstance(x, my_class) == True
That's why my __getitem__
returns a my_class
object. This includes a call to np.atleast_1d()
to ensure the _data
is always a np.ndarray
. I tried np.array()
instead of np.atleast_1d()
, but that can apparently create 0-dimensional arrays which cannot be indexed.
Problem:
In __setitem__
, if self._data[key]
is an int
object and not a np.ndarray
(for example, because key
is an int
), then I end up with nested np.ndarray
objects inside the array. For example, this script:
import numpy as np
from my_package import my_class
my_object = my_class(np.array([[1,2,3],[4,5,6]], dtype=object), "info")
my_object[1,2] = my_object[0,0]
my_object[0,0:2] = my_object[1,1:3]
print(my_object._data)
Produces this output:
[[5 array([1]) 3]
[4 5 array([1])]]
Attempted solution:
I did a lot of searching and found many suggestions related to using isinstance(key, slice)
to handle slices as a special case. But things seem to be more complicated for numpy.ndarray
, which supports tuple
and list
indexes to handle multi-dimensional indexing and automatic broadcasting, etc. This SO answer touches very briefly on adding additional special handling for tuple
, but not list
.
The best solution I could think of was to instead treat the scalar int
data as the special case:
def __setitem__(self, key, value):
if isinstance(self._data[key], int):
self._data[key] = value._data[0]
else:
self._data[key] = value._data
Questions:
- Is there a better way to do this?
- Are there any glaring pitfalls in my attempted solution?
- I considered subclassing
np.ndarray
(instead of just having anp.ndarray
member variable), but it seemed like this would cause allnp.ndarray
functionality to be inherited bymy_class
. I wantmy_class
to be intuitive for someone familiar withnp.ndarray
, but I want to control which features are implemented and how.
- Is using
isinstance()
like this a "code smell"? (I feel like it goes against the idea of duck typing in Python).
Other notes:
- I am interested in Python 3.x.
- I am aware that using
int
as the data type nullifies some of the advantages of numpy (e.g. execution speed). I need arbitrary precision.