0
votes

I take data from a search box and then insert into MongoDB as a document using the regular insert query. The data is stored in a collection for the word "cancer" in the following format with unique "_id".

{
  "_id": {
    "$oid": "553862fa49aa20a608ee2b7b"
  },
  "0": "c",
  "1": "a",
  "2": "n",
  "3": "c",
  "4": "e",
  "5": "r"
}

Each document has a single word stored in the same format as above. I have many documents as such. Now, I want to remove the duplicate documents from the collection. I am unable to figure out a way to do that. Help me.

1
No Sourabh. Here, I am confused why the alphabets of a word are being assigned a value. - Vamshi
Normally you would do this by making the word the key since that is unique - Sammaye
Now, I have many number of duplicate documents with same word. How can I remove them? - Vamshi

1 Answers

2
votes

an easy solution in mongo shell: `

use your_db
db.your_collection.createIndex({'1': 1, '2': 1, '3': 1, etc until you reach maximum expected letter count}, {unique: true, dropDups: true, sparse:true, name: 'dropdups'})
db.your_collection.dropIndex('dropdups')

notes:

  • if you have many documents expect this procedure to take very long time
  • be careful this will remove documents in place, better clone your collection first and try it there.