[I apologize for the inept title; I could not come up with anything better. Suggestions for a better title are welcome.]
I want to implement an interface to HDF5 files that supports multiprocess-level concurrency through file-locking. The intended environment for this module is a Linux cluster accessing a shared disk over NFS. The goal is to enable the concurrent access (over NFS) to the same file by multiple parallel processes running on several different hosts.
I would like to be able to implement the locking functionality through a wrapper class for the h5py.File
class. (h5py
already offers support for thread-level concurrency, but the underlying HDF5 library is not thread-safe.)
It would be great if I could do something in the spirit of this:
class LockedH5File(object):
def __init__(self, path, ...):
...
with h5py.File(path, 'r+') as h5handle:
fcntl.flock(fcntl.LOCK_EX)
yield h5handle
# (method returns)
I realize that the above code is wrong, but I hope it conveys the main idea: namely, to have the expression LockedH5File('/path/to/file')
deliver an open handle to the client code, which can then perform various arbitrary read/write operations on it. When this handle goes out of scope, its destructor closes the handle, thereby releasing the lock.
The goal that motivates this arrangement is two-fold:
decouple the creation of the handle (by the library code) from the operations that are subsequently requested on the handle (by the client code), and
ensure that the handle is closed and the lock released, no matter what happens during the execution of the intervening code (e.g. exceptions, unhandled signals, Python internal errors).
How can I achieve this effect in Python?
Thanks!