I have a lot of PDF files stored in a database (MSSQL) I need to search. They are stored as BLOB. I need a walk through on how to search them using SOLR. I have a DB, lets call it "fred". Inside Fred is a table, we'll call it pdffiles. pdffiles has a column named pdfdata, of type BLOB. The pdfs are stored in this table, with the binary data stored in the column. What steps do I take to get SOLR to extract this data and index it? I'm guessing it involves the TikaEntityProcessor but having the pdfs stored in the database rather than just being regular files adds a level of complexity. I have previously worked with SOLR and have it running in production. Sample dataconfig and schema files would be very useful.
0
votes
1 Answers
0
votes
What steps do I take to get SOLR to extract this data and index it?
create a new file called
tika-data-config.xml
which will have database configurations and the query to get the data.You need to update the
solrconfig.xml
in a text editor and add the following within the config tags:
- You need to mention the libs related to data-import handler.
- Provide the respective database jar file.
- Do the changes in the
schema.xml
file by mentioning your field. Add the proper fieldType for your field depending on your search requirement. - Once the setup is ready then you can request solr for indexing
using
http://localhost:8983/solr/collection1/dataimport?command=full-import
Please refer the link at solr for more detailed...Configure DIH