0
votes

I have a lot of PDF files stored in a database (MSSQL) I need to search. They are stored as BLOB. I need a walk through on how to search them using SOLR. I have a DB, lets call it "fred". Inside Fred is a table, we'll call it pdffiles. pdffiles has a column named pdfdata, of type BLOB. The pdfs are stored in this table, with the binary data stored in the column. What steps do I take to get SOLR to extract this data and index it? I'm guessing it involves the TikaEntityProcessor but having the pdfs stored in the database rather than just being regular files adds a level of complexity. I have previously worked with SOLR and have it running in production. Sample dataconfig and schema files would be very useful.

1

1 Answers

0
votes

What steps do I take to get SOLR to extract this data and index it?

  1. create a new file called tika-data-config.xml which will have database configurations and the query to get the data.

  2. You need to update the solrconfig.xml in a text editor and add the following within the config tags:

enter image description here

  1. You need to mention the libs related to data-import handler.
  2. Provide the respective database jar file.
  3. Do the changes in the schema.xml file by mentioning your field. Add the proper fieldType for your field depending on your search requirement.
  4. Once the setup is ready then you can request solr for indexing using http://localhost:8983/solr/collection1/dataimport?command=full-import

Please refer the link at solr for more detailed...Configure DIH