5
votes

I am indexing a row of data from database in Lucene.Net. A row is equivalent of Document.

I want to update my database with the DocId, so that I can use the DocId in the results to be able to retrieve rows quickly.

I currently first retrive the PK from the result docs which I think should be slower than retriving directly from the database using DocId.

How can I find the DocId when adding a document to Lucene?

2

2 Answers

3
votes

Relying on Lucene's DocId is a bad policy, as even Lucene tries to avoid this. I suggest you create your own DocId. In a database I would use an auto-increment field. If your application does not use a relational database, you can create this type of field programmatically. Other than that, I suggest you read Search Engine versus DBMS - I believe that only fields that may be searched should be stored in Lucene; The rest of the row belongs in a database, so the sequence of events is:

  1. Using Lucene, search for some text and get a DocId.
  2. Use the DocId to retrieve the full row from the database.
2
votes

As Yuval stated, leaking internal Lucene implementation details is bad, especially since Lucene doc id's change when the index is mutated.

If looking up the primary key using doc.get("pk") is too slow for you, use a FieldCache to cache all the pk's in memory. Then the lookups will be plenty fast.