There are a couple of things you can do to speed up the search.
First, if you don't use scoring, you should disable norms, this will make the index smaller.
Since you only use StringField and LongField (as opposed to, say, the TextField with a keyword tokenizer), norms are disabled for these Field, so you've already got that one.
Second, you should structure and wrap your query, so that you minimize the calculation of actual scores. That is, if you use BooleanQuery, use Occur.FILTER instead of Occur.MUST. Both have the same inclusion logic, but filter doesn't score. For other queries, consider wrapping them in a ConstantScoreQuery. However, this might not be necessary at all (explanation follows).
Third, use a custom Collector. The default search method is meant for small, ranked or sorted result sets, but your use case doesn't fit that pattern. Here is a sample implementation:
import org.apache.lucene.document.Document;
import org.apache.lucene.index.LeafReader;
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.search.SimpleCollector;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
final class AllDocumentsCollector extends SimpleCollector {
private final List<Document> documents;
private LeafReader currentReader;
public AllDocumentsCollector(final int numDocs) {
this.documents = new ArrayList<>(numDocs);
}
public List<Document> getDocuments() {
return Collections.unmodifiableList(documents);
}
@Override
protected void doSetNextReader(final LeafReaderContext context) {
currentReader = context.reader();
}
@Override
public void collect(final int doc) throws IOException {
documents.add(currentReader.document(doc));
}
@Override
public boolean needsScores() {
return false;
}
}
You would use it like this.
public List<Document> performLuceneSearch(final Query query) throws IOException {
// the reader instance is reused as often as possible, and exchanged
// when a write occurs using DirectoryReader.openIfChanged(...).
final AllDocumentsCollector collector = new AllDocumentsCollector(this.reader.numDocs());
this.searcher.search(query, collector);
return collector.getDocuments();
}
The collector uses a list instead of a set. Document does not implement equals or hashCode, so you don't profit from a set and only pay for additional equality checks. The final order is the so called index order. The first document will be the one that comes first in the index (roughly insertion order, if you don't have custom merge strategies in place, but ultimately it's an arbitrary order that is not guaranteed to be stable or reliable). Also, the collector signals that no scores are needed, which gives you about he same benefits as using option 2 from above, so you can save yourself some trouble and just leave your query as they are right now.
Depending on what you need the Documents for, you can get an even greater speedup by using DocValues instead of stored fields. This is only true if you require only one or two of your fields, not all of them. The rule of thumb is, for few documents but many fields, use stored fields; for many documents but few fields, use DocValues. At any rate, you should experiment – 8 fields is not that much and you might profit event for all fields. Here is how you would use DocValues in your index process:
import org.apache.lucene.document.Field;
import org.apache.lucene.document.LongField;
import org.apache.lucene.document.NumericDocValuesField;
import org.apache.lucene.document.SortedDocValuesField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.util.BytesRef;
document.add(new StringField(fieldName, stringContent, Field.Store.NO));
document.add(new SortedDocValuesField(fieldName, new BytesRef(stringContent)));
// OR
document.add(new LongField(fieldName, longValue, Field.Store.NO));
document.add(new NumericDocValuesField(fieldName, longValue));
The fieldname can be the same and you can choose to not store your other fields if you can rely completely on DocValues.
The the collector has to be changed, exemplary for one field:
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.index.SortedDocValues;
import org.apache.lucene.search.SimpleCollector;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
final class AllDocumentsCollector extends SimpleCollector {
private final List<String> documents;
private final String fieldName;
private SortedDocValues docValues;
public AllDocumentsCollector(final String fieldName, final int numDocs) {
this.fieldName = fieldName;
this.documents = new ArrayList<>(numDocs);
}
public List<String> getDocuments() {
return Collections.unmodifiableList(documents);
}
@Override
protected void doSetNextReader(final LeafReaderContext context) throws IOException {
docValues = context.reader().getSortedDocValues(fieldName);
}
@Override
public void collect(final int doc) throws IOException {
documents.add(docValues.get(doc).utf8ToString());
}
@Override
public boolean needsScores() {
return false;
}
}
You would use getNumericDocValues for the long fields, respectively. You have to repeat this (in the same collector of course) for all your fields that you have to load and most important: measure when its better to load full documents from the stored fields instead of using DocValues.
One final note:
I am doing locking on the application level, so Lucene won't have to worry about concurrent reads and writes.
The IndexSearcher and IndexWriter itself are already thread-safe. If you lock solely for Lucene, you can remove those locks and just share them amongst all your threads. And consider using oal.search.SearcherManager for reusing the IndexReader/Searcher.