16
votes

I'm playing around with a Solr-powered search for my webapp, and I figured it'd be best to use the DataImportHandler to handle syncing with the app via the database. I like the elegance of just checking the last_updated_date field. Good stuff. However, I don't know how to handle deleting documents with this approach. The way I see it, I've got 2 choices. I could either send an explicit message to Solr from the client when a document is deleted, or I could add a "deleted" flag and leave the object in the database, so that Solr will notice that the document has changed and is now "deleted." I could add a query filter that would disregard results with the deleted flag, but it seems inefficient to include all the deleted documents in the Lucene index. What do other folks do?

2

2 Answers

22
votes

These are your options:

  • Use DIH special commands $deleteDocById or $deleteDocByQuery (requires Solr 1.4+)
  • Use the clean parameter of DIH to delete the whole index before importing.
  • Use preImportDeleteQuery to define what's going to be cleaned up before importing. (requires Solr 1.4+)
  • Use database triggers instead of DIH to manage updating the index.
  • If you're using some sort of ORM use its interception capabilities instead of DIH. For example you can use hibernate events to update the index on update, insert or delete.
2
votes

I like to have a "deleted" flag so I don't actually delete my data! Depends on how paranoid you are. I like Mauricio's suggestions...