0
votes

I want to detect the file type of the given file using Apache tika. Not only with the file name, but It should also check the content-based and return the file type. I used "tika.detect(stream)" It is working fine for text, image, XML files. Now I want to detect the certificate files using Tika. But content based file type detection is not working for certificate files(X509, .pem, .der etc)

Thanks in advance

1

1 Answers

0
votes

Support for many of thees has only just been added to Apache Tika! Details in TIKA-3205.

Once released, if you upgrade to Apache Tika 1.25 or 2.0, you'll get detection for DER and PEM encoded Certificates, Private Keys and Public Keys. In earlier versions of Tika, only the PKCS#7 family of formats was detected.

Until 1.25 / 2.0 are released, a nightly build / manual build from git after 2020-09-30 will have the extra detection in

If there are any more certificate-related formats that Apache Tika lacks, the best option is to raise an enhancement request in the Tika JIRA against the mime component, and upload a few sample files