I have a collection of documents in MongoDb (url: String, title: String, content: String). url is a unique field and contains something like server://aaa/bbb/1.html.
I would like to index data with Lucene, not Mongo (I can change storage). I'm going to store url in Lucene's index. When user searchs something by keywords, I'll perform query with Lucene, read url field and go to Mongo to extract doc by the url. It works well.
But I can't delete data from Lucene's index by url because it contains a lot of not allowed symbols. I use following settins for url field:
store = true
analyzed = false
indexed = true
(Should I index this field? What if I don't index this field? Will Lucene do a full scan? Collection can contain millions of documents)
If I want to have good performance should I create secondary index (Int or Long) and don't search by url?
I use latest versions of JVM, Lucene, Ubuntu and Mongo.