2
votes

I'm using h5py with LZF compression to store NumPy arrays in HDF5 files.

It works well, and my compressed files are much more portable than the uncompressed ones. However, if I try to view the compressed files using applications like vitables and HDFView, I get the following errors:

"Error: problems reading records. The dataset seems to be compressed with the None library. Check that it is installed in your system, please" in vitables and

"ncsa.hdf.hdf5lib.exceptions.HDF5Exception: ncsa.hdf.hdf5lib.exceptions.HDF5LibraryException: Can't open directory or file" in HDFView.

I can browse the file structures OK in both appications, but opening an array produces an error. If I turn off compression, the problem goes away. As an example, after running the code below, opening array_1 gives me the error, but array_2 doesn't.

import numpy as np, h5py

h5_path = r'D:\test.h5'

f = h5py.File(h5_path, 'w')

# Create fake data
data = (np.random.random(1E6)*100).astype(int)

# Save with compression
dset1 = f.create_dataset(r'/path/to/arrays/array_1', data=data, 
                         compression='lzf')

# Save without compression
dset2 = f.create_dataset(r'/path/to/arrays/array_2', data=data)

# Set some object properties
dset1.attrs['Description'] = 'Compressed array.'
dset2.attrs['Description'] = 'Uncompressed array.'

f.close()

Is this behaviour expected, or am I doing something wrong?

If vitables and HDFView can't open compressed arrays, is there an alternative viewer that can?

Thanks very much!

2

2 Answers

6
votes

I had the exact same problem for datasets stored with LZF compression, and I ended up finding this post. With HDFView I managed to view datasets that were compressed using GZIP with a compression level of 9, i.e.:

dset = f.create_dataset('someData', data=data, compression="gzip", compression_opts=9)

But still I wanted to see LZF compressed datasets. There is a GUI interface to the HDF5 files named HDF Compass (Github repo), developed among others by Andrew Collette, well known in the HDF world. When this question was asked the development of HDF Compass was just starting. Today I tested version 0.6.0 and I managed to view a LZF compressed file correctly.

PD: Just a warning, HDF Compass is just a read-only tool, unlike HDFView. But still, it is very user friendly and really snappy.

2
votes

While h5py comes with LZF, HDF5 itself is not generally distributed or compiled with LZF. Instead, you can use gzip, which is included with all HDF5 versions and so can be opened on any system:

dset1 = f.create_dataset(r'/path/to/arrays/array_1', data=data, 
                         compression='gzip')

HDFView can open arrays compressed with gzip.

Additionally, if you use gzip, you can use compression_opts to set the compression level (an integer between 0 and 9):

dset1 = f.create_dataset(r'/path/to/arrays/array_1', data=data, 
                         compression='gzip', compression_opts=9)