0
votes

Consider that I have two sets of objects "Questions" and "Users" and they share a relation.

When building my lucene index, I can index these two objects individually i.e. a document for each question and a document for each user.

Or I can have a data transfer object which will have properties from questions and users flatten in a single object and index this data transfer object.

When performing search lets say we can search by question title and question author which is just a user.

For some reason let's say my system allows a user to change his display name.

What would be the best approach to index my objects to allow latest changes to be reflected in lucene idex?

  1. Should I have separate documents for users and questions and have lucene get required question/user details as required?
  2. Or, go the data transfer object way? When there are changes just delete these documents and reindex?
2

2 Answers

2
votes

I would runt two indexes -- one for users by question and one for questions by users. Depending on the searches you need to do both can come in handy.

Re-reading your question, what you need to do when a user updates their name is do a lucene query to get the docs that user has and update them. Depending on how you are doing indexing that change could be reflect really quickly.

0
votes

The primary object id will be associated to the item when it was originally imported/indexed. If the user changes their name or other fields values the next time that record is imported into lucene if there is no logic to manage/check for an update to the indexed record then a new record will be created. Your name example is a perfect scenario for this.

Going the first approach you will retain all users in the system. If the person entered their name as Bob and then changed it to William you will end up with always having in the index two names hanging off the question as the indexer will always view these as different records.

If you want to minimize duplication I would suggest deleting the index and re-indexing the data this will ensure data integrity. Otherwise you may see duplicate records with different object id's (i.e. new indexed records)