0
votes

I wish to index data across few databases of our application in the lucene. how to structure the index? index per table such that the columns are the fields and data are the values? or index per database, the variable table columns with diff fields of lucene? if no then how to structure the index so that search and maintenance will not be complicated? assuming 100 tables per databases and 10K rows per table.

1

1 Answers

0
votes

It completely depends on the underlying data, and how you want to query it, and without knowing this it is impossible to provide a definitive answer.

If your database schema is normalised you my want to denormalise it somewhat to create a record, consisting of table data from more than one table, per document.

Another factor determining the fields you assign to the document will be how you want to query the data.

For example, given the following normalised schema:

TABLE:AUTHOR        COLS:AUTHOR_ID,NAME
TABLE:BOOKS         COLS:BOOK_ID,TITLE,CONTENT
TABLE:AUTHOR_BOOKS  COLS:AUTHOR_ID,BOOK_ID

You could index a single document per author/book:

Document (field1:author, field2:title, field3:content)

This will allow you to search for book matches by either author, title or content.