I wish to create a h5py
"string" dataset (for example "A"), using the data type "array of 8-bit integers (80)" (as shown in HDFView, see here). Each integer of this array of length 80 is in fact ord(x)
of the corresponding character of this string. So for instance Top
is stored as 84 111 112 0 0 0 ...
, with in total 80 int8
.
The desired dataset should look like this
DATASET "NOM" {
DATATYPE H5T_ARRAY { [80] H5T_STD_I8LE }
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): [ 84, 111, 112, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
}
However I'm unable to create this dataset using h5py
. Using a standard numpy array gives this
DATASET "NOM" {
DATATYPE H5T_STD_I8LE
DATASPACE SIMPLE { ( 1, 80 ) / ( 1, 80 ) }
DATA {
(0,0): 84, 111, 112, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
(0,15): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
(0,31): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
(0,47): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
(0,63): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
(0,79): 0
}
}
So what is data
and dtype
needed, if my string, say, is "Top".
.create_dataset("NOM", data=data, dtype=dtype)
According to https://github.com/h5py/h5py/issues/955, maybe I need to use a lower level interface...?
Thanks!
Solution
The problem is that if we create the numpy dataset data
before writing it by using .create_dataset("NOM", data=data)
, internally numpy will always interpret my 80int8
data type as a 1d array of int8
dtype = np.dtype("80int8")
x = np.array(2, dtype=dtype)
# x.dtype = dtype('int8')
The solution is thus to declare the data set with the desired dtype
first, then fill in the data.
dataset = gro.create_dataset("NOM", (len(nom),), dtype="80int8")
for i in range(len(nom)):
nom_80 = nom[i] + "\x00" * (80 - len(nom[i])) # make nom 80 characters
dataset[i] = [ord(x) for x in nom_80]
# dataset.dtype = dtype(('i1', (80,)))