9
votes

I have some data that I share between Python and Matlab. I used to do it by saving NumPy arrays in MATLAB-style .mat files but would like to switch to HDF5 datasets. However, I've noticed a funny feature: when I save a NumPy array in an HDF5 file (using h5py) and then read it in Matlab (using h5read), it ends up being transposed. Is there something I'm missing?

Python code:

import numpy as np
import h5py

mystuff = np.random.rand(10,30)

f = h5py.File('/home/user/test.h5', 'w')
f['mydataset'] = mystuff
f.close()

Matlab code:

mystuff = h5read('/home/user/test.h5', '/mydataset');
size(mystuff) % 30 by 10
3

3 Answers

7
votes

This is a quirk in Matlab's HDF5 reader routines. (I think the reasoning behind this behavior is: the data is in C-order in the binary file, and Matlab arrays are in Fortran order, so they should report the data as transposed rather than go reordering it.)

If you inspect the file created by Python with HDF5 tools, the dimensions are what they should be:

$ h5ls test.h5 
mydataset                Dataset {10, 30}
7
votes

See the Matlab HDF5 documentation which includes the statement:

Because HDF5 stores data in row-major order and the MATLAB array is organized in column-major order, you should reverse the ordering of the dimension extents ...

Even today, long after the Mathworks translated their code to C (etc), the product's Fortran origins poke above the surface now and then.

2
votes

When reading data from MatLab, dimensions of the data read need to be permuted to retrieve data layout. To do so, permute function is used. The code below gives the general case with any number of dimensions

rawdata = h5read(h5Filename,h5Dataset);
ndim = numel(size(rawdata));
data = permute(rawdata,[ndim:-1:1]);

When one works with 2D data, one can only transpose result from h5read

data = h5read(h5Filename,h5Dataset)';