15
votes

My python code is receiving a byte array which represents the bytes of the hdf5 file.

I'd like to read this byte array to an in-memory h5py file object without first writing the byte array to disk. This page says that I can open a memory mapped file, but it would be a new, empty file. I want to go from byte array to in-memory hdf5 file, use it, discard it and not to write to disk at any point.

Is it possible to do this with h5py? (or with hdf5 using C if that is the only way)

3
I'm trying to do the same thing. Could you show some code with the solution who worked? thanks!konus
I found the solution and posted it here: stackoverflow.com/questions/11588630/…SCGH
Is it still unresolved? This answer explains how to read h5 file from bytearray in memory. But how can I get such bytearray from given h5 file in file system. I want to load h5 file on machine different from one having h5 file on its file system. So was thinking to read it as byte stream & send the byte stream to target machine & then load h5 file from that bytearray on target machine. Is it possible? Just asked questionanir

3 Answers

6
votes

You could try to use Binary I/O to create a File object and read it via h5py:

f = io.BytesIO(YOUR_H5PY_STREAM)
h = h5py.File(f,'r')
2
votes

You can use io.BytesIO or tempfile to create h5 objects, which showed in official docs http://docs.h5py.org/en/stable/high/file.html#python-file-like-objects.

The first argument to File may be a Python file-like object, such as an io.BytesIO or tempfile.TemporaryFile instance. This is a convenient way to create temporary HDF5 files, e.g. for testing or to send over the network.

tempfile.TemporaryFile

>>> tf = tempfile.TemporaryFile()
>>> f = h5py.File(tf)

or io.BytesIO

"""Create an HDF5 file in memory and retrieve the raw bytes

This could be used, for instance, in a server producing small HDF5
files on demand.
"""
import io
import h5py

bio = io.BytesIO()
with h5py.File(bio) as f:
    f['dataset'] = range(10)

data = bio.getvalue() # data is a regular Python bytes object.
print("Total size:", len(data))
print("First bytes:", data[:10])
0
votes

The following example uses tables which can still read and manipulate the H5 format in lieu of H5PY.

import urllib.request
import tables
url = 'https://s3.amazonaws.com/<your bucket>/data.hdf5'
response = urllib.request.urlopen(url) 
h5file = tables.open_file("data-sample.h5", driver="H5FD_CORE",
                          driver_core_image=response.read(),
                          driver_core_backing_store=0)