0
votes

I have an application which holds a list of documents. These documents are indexed using Lucene. I can search on keywords of the documents. I loop the TopDocs and get the ID field (of each Lucene doc) which is related to the ID column in my relational database. From all these ID's, I create a list. After building the list of ID's, I make a database query which is executing the following SELECT statement (JPA):

SELECT d From Document WHERE id IN (##list of ID's retrieved from Lucene##)

This list of document is sent to the view (GUI).

But, some documents are private and should not be in the list. Therefore, we have some extra statements in the SELECT query to do some security checks:

SELECT d From Document WHERE id IN (##list of ID's retrieved from Lucene##)
AND rule1 = foo
AND rule2 = bar

But now I'm wondering: I'm using the speed of Lucene to quickly search documents, but I still have to do the SELECT query. So I'm loosing performance on this one :-( ... Does Lucene have some component which does this mapping for you? Or are there any best practices on this issue? How do big projects map the Lucene results to the relation database? Because the view should be rendering the results?

Many thanks!

Jochen

3
What kind of performance hit are you worried about taking? Lucene is for indexing, ideally you're going to have a database or file system of some underneath that. If the relational database underneath Lucene is the appropriate choice for the rest of your system, what you're describe is the correct way to do things.dfb
Well, I thought I could use Lucene so I didn't need a single MySQL query. Just fetch all Document attributes/details from the Lucene index. But because of the extra checks, we need to perform an extra MySQL queryJochen Hebbrecht

3 Answers

0
votes

Why don't you use lucene to index the table in the database? That way you can do everything in 1 lucene query.

0
votes

if this is a big issue maybe it's worth looking at ManifoldCF that supports document level security that might fit your needs.

0
votes

Some suggestions:

  • In Lucene, you can use a Filter to narrow down the search result according to your rules.
  • Store the primary key or a unique key (an ID, a serial number, etc.) in Lucene. Then, your relational database can make unique key lookups and make things very fast.
  • Lucene can act as storage of your documents too. If applicable in your case, you just retrieve the individual documents' content from Lucene and don't need to go to your relational database.