1
votes

I'll try indexing and Zend Lucene Search for the first time and I'm wondering if datas from the database should be indexed or not, considering performance. If so when (in which case) ?

My first goal was to index documents (pdf) to search in.

In the communications module of the application we develop, users can search for communications by entering keywords. The app searches in the subject and content of communications stored in the database and now, with the index, it also searches in the content of the documents attached to the communications.

In that way I've to search in the database AND in the index.

So now I'm wondering if I should index the subject and content (e.g. : UnIndexed Lucene Field) of the communication ? Would it be faster ? Considering that the number of documents and communications will increase quickly, and so the index too.

Anyone has experience about that ?

1

1 Answers

0
votes

Yes, bringing in any content you want to be able to search would be a very good idea. You run into a couple of problems searching both the database and the index.

First, performance will be worse. Having to run a search against two different sources would generally be expected to be significantly slower than having all the searchable content in one place.

Second, merging and ordering search results tends to be problematic. If you are making use of relevance score ordering (and if you are searching against full-text content, you probably should be), then merging the results from the two different sources becomes difficult. You'll likely end up with a less useful ordering, and another performance hit.

Especially if you are only indexing (not storing) the content you are considering adding to the index, there is very little reason not to do so, to my mind. Being able to search for whatever you need in the index will be more powerful and faster.