How do I extract text from a .tex file using Apache Tika? An example file is at http://www.tug.org/texshowcase/EulerGibbsDuhem.tex
Tika is able to correctly detect the content type as application/x-tex but does not extract anything from it.
I tried the command
java -jar tika-app-0.9.jar -t EulerGibbsDuhem.tex
and also the following code snippet:
File file = new File(fileName);
Tika tika = new Tika();
String mimeType = tika.detect(file);
pageContent = tika.parseToString(file);