In an attempt to reverse-engineer a file format, I have arrived at a following minimal example for creating a composite numpy datatype and saving it to HDF5. The original file seems to be storing datasets of the below data type. However, I do not seem to be able to write such datasets to a file.
import numpy as np
import h5py
data = ("Many cats".encode(), np.linspace(0, 1, 20))
data_type = [('index', 'S' + str(len(data[0]))), ('values', '<f8', (20,))]
arr = np.array(data, dtype=data_type)
print(arr)
h5f = h5py.File("lol.h5", 'w')
dset = h5f.create_dataset("data", arr, dtype=data_type)
h5f.close()
This code crashes with the error
Traceback (most recent call last): File "test.py", line 13, in dset = h5f.create_dataset("data", arr, dtype=data_type) File "/opt/anaconda3/lib/python3.7/site-packages/h5py/_hl/group.py", line 116, in create_dataset dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds) File "/opt/anaconda3/lib/python3.7/site-packages/h5py/_hl/dataset.py", line 75, in make_new_dset shape = tuple(shape) TypeError: iteration over a 0-d array
How can I overcome this issue?
f.create_dataset('foo1', data=arr)
syntax. WIthout a keyword, the second argument is assumed to the shape. So always usedata=
when providing the actual array. – hpaulj