2
votes

I have a document structure where each text line in the document has some meta-data associated with it. The search result must show the line and the meta-data for the line.

Currently I am storing each such line as a Lucene documents and storing the metata-data as one of the non-indexed fields. That is I create and add a Lucene Document structure for each line. My concerns is that I may end up with too many Documents in the index.

Is there a more elegant approach ?

Thanks

2
Have you looked into Payloads in Lucene? They let you store additional information along with each term.John Glassmyer

2 Answers

1
votes

How many is "too many"? Lucene has been known to handle hundreds of millions of records in a single index, so I doubt that you should have a problem. That being said, there's no substitute for testing and benchmarking yourself to see if this approach is good for your needs.

1
votes

Personally I'd index the documents as normal, and figure out the metadata / line number later.

There is no question about whether or not Lucene can cope with that many documents, however it might degrade the search results somewhat. For you can perform searches where you look for multiple terms in close proximity to each other, however this obviously won't work when the terms are split over multiple documents (lines).