13
votes

How can I resize an HDF5 array using the h5py Python library ?

I've tried using the .resize method and on an array with chunks set to True. Alas, I'm still missing something.

In [1]: import h5py

In [2]: f = h5py.File('foo.hdf5', 'w')

In [3]: d = f.create_dataset('data', (3, 3), dtype='i8', chunks=True)

In [4]: d.resize((6, 3))
/home/mrocklin/Software/anaconda/lib/python2.7/site-packages/h5py/_hl/dataset.pyc in resize(self, size, axis)
--> 277         self.id.set_extent(size)
ValueError: unable to set extend dataset (Dataset: Unable to initialize object)

In [11]: h5py.__version__ 
Out[11]: '2.2.1'
2
Perhaps it is something to do with the datatype of the array... Maybe try a more standard datatype such as one shown in the documentation for initializing an array?anon582847382
Just tried it with no dtype specified (I think it defaults to float). Same errorMRocklin
Are you missing maxshape on create_dataset?SlightlyCuban
@SlightlyCuban that solves it. Does maxshape allocate that much space on disk? Why not set it infinite?MRocklin
@MRocklin what version of h5py are you using? I just tried this using 2.2.1 and didn't have a problem.SlightlyCuban

2 Answers

14
votes

As mentioned by Oren, you need to use maxshape when creating the dataset if you want to change the array size later. Setting a dimension to None allows you to resize that dimension up to 2**64 (h5's limit) later:

In [1]: import h5py

In [2]: f = h5py.File('foo.hdf5', 'w')

In [3]: d = f.create_dataset('data', (3, 3), maxshape=(None, 3), dtype='i8', chunks=True)

In [4]: d.resize((6, 3))

In [5]: h5py.__version__
Out[5]: '2.2.1'

See the docs for more.

3
votes

You need to change this line:

d = f.create_dataset('data', (3, 3), dtype='i8', chunks=True)

To

d = f.create_dataset('data', (3, 3), maxshape=(?, ?), dtype='i8', chunks=True) 

d.resize((?, ?))

Change the ? to whatever size you what (You can also set it to None)

Read here: http://docs.h5py.org/en/latest/high/dataset.html#resizable-datasets