0
votes

I recently overheard a few coworkers talking about an article one of them had read involving the use of SOLR in conjunction with a database and an app to provide a "super-charged" text search engine for the app itself. From what I could make out, SOLR is a web service that exposes Lucene's text searching capabilities to a web-enabled app.

I wasn't able to find the article they were talking about, but doing a few relevant Google searches chaulks up several super-abstract articles on text search engines using SOLR.

What I'm wondering is: what's the relationship between all 3 components here?

Who calls who? Does Lucene somehow regularly extract and cache text data from the DB, and then the app queries SOLR for Lucene's text content? What's a typical software stack/setup for a Java-based, SOLR-powered text search engine? Thanks in advance!

2

2 Answers

1
votes

You're right in your basic outline here: SOLR is a webservice and syntax helper that sits on top of Lucene.

Essentially, SOLR is configured to index specific data based on a number of configuration options (that include weighting, string manipulation, etc.) SOLR can either be pointed at a DB as its source of data to index, or individual documents (eg, XML files) can be submitted via the web API for indexing.

A web application would typically make an HTTP(s) request to the SOLR API, and SOLR would return indexed data that matches the query. For all intents and purposes, the web app sees SOLR as an HTTP API; it doesn't need to be aware of Lucene in any way. So essentially, the data flow looks like:

Website --> SOLR API --> indexed datasource (DB or document collection)

In terms of "when" SOLR looks at the DB to index new or updated data, this can be configured in a number of ways, but is most typically triggered by calling a specific function of the SOLR API that causes a reindex. This could occur manually, via a scheduled job, programmatically from the web app, etc.

0
votes

This is what I understood when I started implementing it for my project -

  • SOLR can be termed as a middleman between your application server and the DB. SOLR consists of its own server (jetty) which will be up and listening to any request coming from your app server.

  • Your application server calls SOLR, giving it the module name and the search pattern

  • SOLR will be fed with some xml config files which will tell it, which table of your schema has to be cached (or indexed) for the given module name

  • SOLR might be using Lucene's text search capabilities to understand the "search pattern" and get the desired result from the already cached/indexed data

  • SOLR indexing (full or partial) can be done manually (by executing commands through GET URLs) or in regular intervals using the SOLR config files

You can refer Apache SOLR site for more information