1
votes

I've been able to create a compound dataset consisting of an unsigned int and a variable-length string in my HDF5 file using h5py, but I can't write to it.

dt = h5py.special_dtype(vlen=str)
dset = fout.create_dataset(ver, (1,), dtype=np.dtype([("time", np.uint64),("value", dt)]))

I've written to other compound datasets fairly easily, by setting the specific column(s) of the compound dataset as equal to an existing numpy array.

Now where I run into trouble is with writing to the compound dataset with a variable length string. Numpy does not support a variable length string, so I can't create the numpy array before hand that would contain the value.

My next thought was to write the individual value to the column in question, and this works for the unsigned int. When I try to write a string to the variable-lenght string field in the compound dataset though, I get:

    dset["value"] = str("blah")
  File "D:\Anaconda3\lib\site-packages\h5py\_hl\dataset.py", line 508, in __setitem__
    val = val.astype(numpy.dtype([(names[0], dtype)]))
ValueError: Setting void-array with object members using buffer.

Any guidance would be much appreciated.

1

1 Answers

4
votes

Following on my earlier answer to Inexplicable behavior when using vlen with h5py

I ran this test (h5py version '2.2.1'):

In [4]: import h5py
In [5]: dt = h5py.special_dtype(vlen=str)
In [6]: f=h5py.File('foo.hdf5')
In [8]: ds1 = f.create_dataset('JustStrings',(10,), dtype=dt)
In [10]: ds1[0]='string'
In [11]: ds1[1]='a longer string'
In [13]: ds1[2:5]='one_string two_strings three'.split()

In [14]: ds1
Out[14]: <HDF5 dataset "JustStrings": shape (10,), type "|O4">

In [15]: ds1.value
Out[15]: 
array(['string', 'a longer string', 'one_string', 'two_strings', 'three',
       '', '', '', '', ''], dtype=object)

And for a mixed dtype like yours:

In [16]: ds2 = f.create_dataset('IntandStrings',(10,),
   dtype=np.dtype([("number",int),('astring',dt)]))
In [17]: ds2[0]=(1,'astring')
In [18]: ds2[1]=(10,'a longer string')
In [19]: ds2[2:4]=[(10,'a longer much string'),(0,'')]
In [20]: ds2.value
Out[20]: 
array([(1, 'astring'), (10, 'a longer string'),
       (10, 'a longer much string'), (0, ''), (0, ''), (0, ''), (0, ''),
       (0, ''), (0, ''), (0, '')], 
      dtype=[('number', '<i4'), ('astring', 'O')])

Trying to set a field by itself does not seem to work

ds2['astring'][4]='one two three four'

Instead I have to set the whole record:

ds2[4]=(123,'one two three four')

Trying to set the whole field produces the same error:

ds2['astring']='astring'

I initialed this dataset to (10,), while yours is (1,). But I think it's the same problem.

I can, though, set the whole numeric field:

In [48]: ds2['number']=np.arange(10)
In [50]: ds2['number']
Out[50]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [51]: ds2.value
Out[51]: 
array([(0, 'astring'), (1, 'a longer string'), 
       (2, 'a longer much string'),
       (3, ''), (4, 'one two three four'), (5, ''), 
       (6, ''), (7, ''),
       (8, ''), (9, '')], 
      dtype=[('number', '<i4'), ('astring', 'O')])