How to avoid duplicate documents in solr?

Question

I am trying to index hbase data using MapReduceindexer Tool. I am successfully indexing using below command

hadoop jar /usr/lib/hbase-solr/tools/hbase-indexer-mr-1.5-cdh5.3.0-job.jar  --hbase-indexer-zk localhost --hbase-table-name indexdemo-user --hbase-indexer-name mynewindexer  --hbase-indexer-file /home/cloudera/indexdemo-indexer.xml   --collection collection1 --go-live

Above command successfully indexed the data ,I am able to see from solr Web UI also.But when I run the above command second time ,again it is indexing the same and showing two documents in solr UI.But my requiurement is to have a single document(i.e second run has to overwrite the first one's data).So that even if I run the same command multiple times,I will have to get only one doc with latest entries.(Note : When I index directly using java API,even if run the same program multiple times ,It is giving only one document .Same thing I need using the indexer tool)

Any suggestions are appreciated..Thanks in advance.

MatsLindh MatsLindh · Accepted Answer · 2015-07-13T11:13:34

Define the field that has the unique value that identifies the document as the uniqueKey for the schema. As long as the uniqueKey is identical across each run, the old documents will be replaced / updated.

If you're however generating a unique value each time you're indexing (or haven't configured the uniqueKey), Solr will have no way to tell which documents really are the same document.

How to avoid duplicate documents in solr?

1 Answers