0
votes

Recently I have set up an image with SOLR. My goal is to index and extract files on a Windows and Linux server. It is possible for me to index and extract data from multiple file types. This is done by the SOLR CELL request handler. See the post.jar cmd below.

j ava -Dauto -Drecursive -jar post.jar Y:\ SimplePostTool version 1.5 Posting files to base url localhost:8983/solr/update.. Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pp tx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log Entering recursive mode, max depth=999, delay=0s 0 files indexed.

Is it possible to index and extract metadata/content from file types like .sh and .sql? If it is possible I would like to know how of course :)

2

2 Answers

0
votes

WHAT specifically do you want to extract from .sh files and .sql files that's different to any other generic file (name, location, date, etc).

Do you want to extract command names used in .sh? Do you want to extract table/field names from .sql? I don't think it is possible now, but if there is a parser for the file format, it can be connected to Tika as a module. And Tika is what Solr uses under the covers.

0
votes

I solved it today. I only needed to add de sh and sql to the mime-map of the SimplePostTool.Java.

mimeMap = new HashMap<>();
mimeMap.put("xml", "text/xml");
mimeMap.put("csv", "text/csv");
mimeMap.put("json", "application/json");
mimeMap.put("pdf", "application/pdf");
mimeMap.put("rtf", "text/rtf");
mimeMap.put("html", "text/html");
mimeMap.put("htm", "text/html");
mimeMap.put("doc", "application/msword");
mimeMap.put("docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
mimeMap.put("ppt", "application/vnd.ms-powerpoint");
mimeMap.put("pptx", "application/vnd.openxmlformats-officedocument.presentationml.presentation");
mimeMap.put("xls", "application/vnd.ms-excel");
mimeMap.put("xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
mimeMap.put("txt", "text/plain");
mimeMap.put("log", "text/plain");
mimeMap.put("sh", "text/plain");
mimeMap.put("sql", "text/plain");

I also added the sh and sql to the following code:

private static final String DEFAULT_FILE_TYPES = "xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log";