I'm working on an open source project dealing with adding metadata to folders. The provided (Python) API lets you browse and access metadata like it was just another folder. Because it is just another folder.
\folder\.meta\folder\somedata.json
Then I came across HDF5 and its derivation Alembic.
Reading up on HDF5 in the book Python and HDF5 I was looking for benefits to using it compared to using files in folders, but most of what I came across spoke about the benefits of a hierarchical file-format in terms of its simplicity in adding data via its API:
>>> import h5py
>>> f = h5py.File("weather.hdf5")
>>> f["/15/temperature"] = 21
Or its ability to read only certain parts of it upon request (e.g. random access), and parallel execution of a single HDF5 file (e.g. for multiprocessing)
You could mount HDF5 files, https://github.com/zjttoefs/hdfuse5
It even boasts a strong yet simple foundation concept of Groups and Datasets which from wiki reads:
- Datasets, which are multidimensional arrays of a homogeneous type
- Groups, which are container structures which can hold datasets and other groups
Replace Dataset with File and Group with Folder and the whole feature-set sounds to me like what files in folders are already fully capable of doing.
For every benefit I came across, not one stood out as being exclusive to HDF5.
So my question is, if I were to give you one HDF5 file and one folder with files, both with identical content, in which scenario would HDF5 be better suited?
Edit:
Having gotten some responses about the portability of HDF5.
It sounds lovely and all, but I still haven't been given an example, a scenario, in which an HDF5 would out-do a folder with files. Why would someone consider using HDF5 when a folder is readable on any computer, any file-system, over a network, supports "parallel I/O", is readable by humans without an HDF5 interpreter.
I would go as far as to say, a folder with files is far more portable than any HDF5.
Edit 2:
Thucydides411 just gave an example of a scenario where portability matters. https://stackoverflow.com/a/28512028/478949
I think what I'm taking away from the answers in this thread is that HDF5 is well suited for when you need the organisational structure of files and folders, like in the example scenario above, with lots (millions) small (~1 byte) data structures; like individual numbers or strings. That it makes up for what file-systems lack by providing a "sub file-system" favouring the small and many as opposed to few and large.
In computer graphics, we use it to store geometric models and arbitrary data about individual vertices which seems to align quite well with it's use in the scientific community.