1
votes

How to parse HDF files(.h5) using Apache Tika.

Apache Tika provides parser for .h5 files, but Using that I'm not able to parse the data.

Parser parser=new HDFParser();
Metadata metadata=new Metadata();
ContentHandler handler=new BodyContentHandler();
FileInputStream fileInputStream=new FileInputStream(path+h5File);

parser.parse(fileInputStream,handler,metadata,new ParseContext());

I can see metadata of file, but I can't get content using handler.

If anyone has done this, Please help me through this.

1
I have a feeling that the HDF parser is metadata-only, but that it should be pulling out most of the file as metadata. What are you expecting to see but aren't finding in metadata? - Gagravarr
I want to parse content of that file. - ketankk
But what content do you want that isn't in the metadata? - Gagravarr
I have converted a video to .h5. I need the 3-D matrix data, which I can see in HDFView. RGB matrix. - ketankk

1 Answers

2
votes

Simply you can't for the nature of HDF format file.

You have to use metadata.get(field-name-in-string-format); for retrieving information you want.

Alternatively you can try directly this Java library: NetCDF (which it is used, under the hood, by Tika)