1
votes

In an attempt to reverse-engineer a file format, I have arrived at a following minimal example for creating a composite numpy datatype and saving it to HDF5. The original file seems to be storing datasets of the below data type. However, I do not seem to be able to write such datasets to a file.

import numpy as np
import h5py

data = ("Many cats".encode(), np.linspace(0, 1, 20))
data_type = [('index', 'S' + str(len(data[0]))), ('values', '<f8', (20,))]

arr = np.array(data, dtype=data_type)
print(arr)

h5f = h5py.File("lol.h5", 'w')
dset = h5f.create_dataset("data", arr, dtype=data_type)
h5f.close()

This code crashes with the error

Traceback (most recent call last): File "test.py", line 13, in dset = h5f.create_dataset("data", arr, dtype=data_type) File "/opt/anaconda3/lib/python3.7/site-packages/h5py/_hl/group.py", line 116, in create_dataset dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds) File "/opt/anaconda3/lib/python3.7/site-packages/h5py/_hl/dataset.py", line 75, in make_new_dset shape = tuple(shape) TypeError: iteration over a 0-d array

How can I overcome this issue?

1
You need to use f.create_dataset('foo1', data=arr) syntax. WIthout a keyword, the second argument is assumed to the shape. So always use data= when providing the actual array.hpaulj

1 Answers

0
votes

I restructured/reordered your code to get it to work with h5py. The code below works for 1 row. You will have to adjust to make the number of rows a variable.

import numpy as np
import h5py

data = ("Many cats".encode(), np.linspace(0, 1, 20))
data_type = [('index', 'S' + str(len(data[0]))), ('values', '<f8', (20,))]

arr = np.zeros((1,), dtype=data_type)
arr[0]['index'] = "Many cats".encode()
arr[0]['values'] = np.linspace(0, 1, 20)

h5f = h5py.File("lol.h5", 'w')
dset = h5f.create_dataset("data", data=arr)

h5f.close()