I have earlier succeeded in parsing all kinds of files with Tika by calling tika.parseToString() without setting any custom configuration or metadata. Now I have the need to filter files to parse based on mime-type.
I can find the mime-type with tika.detect(new BufferedInputStream(inputStream), new Metadata());, but when calling tika.parseToString() afterwards tika uses EmptyParser and the content-type detected is "application/octet-stream". This is default, meaning that tika is unable to find what type of file it is. I have tried to set the content type in Metadata before trying to parse the file, but this leads to org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException. From what I've read this means that the file is malformed, but the same files gets parsed successfully without the check for mime-type beforehand.
Does detect() do something with the InputStream, making the parser unable to parse the files?
I'm using the same tika-instance for both checking the mime-type and parsing, version 1.13