2
votes

I am trying to use h5py to store data as a list of tuples of (images, angles). Images are numpy arrays of size (240,320,3) of type uint8 from OpenCV while angles are just a number of type float16.

When using h5py, you need to have a predetermine shape in order to maintain a usable speed of read/write. H5py preloads the entire dataset with arbitrary values in which you can index later and set these values to whatever you would like.

I would like to know how to set the shape of an inner numpy array when initializing the shape of a dataset for h5py. I believe the same solution would apply for numpy as well.

import h5py
import numpy as np

dset_length = 100

# fake data of same shape
images = np.ones((dset_length,240,320,3), dtype='uint8') * 255

# fake data of same shape
angles = np.ones(dset_length, dtype='float16') * 90
f = h5py.File('dataset.h5', 'a')
dset = f.create_dataset('dset1', shape=(dset_length,2))

for i in range(dset_length):
    # does not work since the shape of dset[0][0] is a number, 
    # and can't store an array datatype
    dset[i] = np.array((images[i],angles[i]))

Recreateing the problem in numpy looks like this:

import numpy as np 

a = np.array([ 
           [np.array([0,0]), 0], 
           [np.array([0,0]), 0], 
           [np.array([0,0]), 0]
         ])

a.shape # (3, 2)

b = np.empty((3,2))

b.shape # (3, 2)

a[0][0] = np.array([1,1])

b[0][0] = np.array([1,1]) # ValueError: setting an array element with a sequence.
2
b = np.empty((3,2), dtype=object) will make it behave like a. But that's not really how you want it to behave anyway.Eric

2 Answers

2
votes

The dtype that @Eric creates should work with both numpy and h5py. But I wonder if you really want or need that. An alternative is to have two arrays in numpy, images and angles, one being 4d uint8, the other float. In h5py you could create a group, and store these 2 arrays as datasets.

You could select the values for the ith' image with

 images[i,...], angles[i]     # or
 data[i]['image'], data[i]['angle']

for example:

import h5py
dt = np.dtype([('angle', np.float16), ('image', np.uint8, (40,20,3))])
data = np.ones((3,), dt)

f = h5py.File('test.h5','w')
g = f.create_group('data')

dataset with the compound dtype:

g.create_dataset('data', (3,), dtype=dt)
g['data'][:] = data

or datasets with the two arrays

g.create_dataset('image', (3,40,20,3), dtype=np.uint8)
g.create_dataset('angle', (3,), dtype=np.float16)
g['image'][:] = data['image']
g['angle'][:] = data['angle']

fetch angle array from either dataset:

g['data']['angle'][:]
g['angle'][:]
2
votes

In numpy, you can store that data with structured arrays:

dtype = np.dtype([('angle', np.float16), ('image', np.uint8, (240,320,3))])
data = np empty(10, dtype=dtype)
data[0]['angle'] = ... # etc