We have a pretty basic Lucene set up. We recently noticed that some documents aren't written to the index.
This is how we create the document:
private void addToDirectory(SpecialDomainObject specialDomainObject) throws IOException {
Document document = new Document();
document.add(new TextField("id", String.valueOf(specialDomainObject.getId()), Field.Store.YES));
document.add(new TextField("name", specialDomainObject.getName(), Field.Store.YES));
document.add(new TextField("tags", joinTags(specialDomainObject.getTags()), Field.Store.YES));
document.add(new TextField("contents", getContents(specialDomainObject), Field.Store.YES));
for (Language language : getAllAssociatedLanguages(specialDomainObject)) {
document.add(new IntField("languageId", language.getId(), Field.Store.YES));
}
specialDomainObjectIndexWriter.updateDocument(new Term("id", document.getField("id").stringValue()), document);
specialDomainObjectIndexWriter.commit();
}
This is how we create the analyzer and the index writer:
<bean id="luceneVersion" class="org.apache.lucene.util.Version" factory-method="valueOf">
<constructor-arg value="LUCENE_46"/>
</bean>
<bean id="analyzer" class="org.apache.lucene.analysis.standard.StandardAnalyzer">
<constructor-arg ref="luceneVersion"/>
</bean>
<bean id="specialDomainObjectIndexWriter" class="org.apache.lucene.index.IndexWriter">
<constructor-arg ref="specialDomainObjectDirectory" />
<constructor-arg>
<bean class="org.apache.lucene.index.IndexWriterConfig">
<constructor-arg ref="luceneVersion"/>
<constructor-arg ref="analyzer" />
<property name="openMode" value="CREATE_OR_APPEND"/>
</bean>
</constructor-arg>
</bean>
Indexing is done with a scheduled task:
@Component
public class ScheduledSpecialDomainObjectIndexCreationTask implements ScheduledIndexCreationTask {
private static final Logger logger = LoggerFactory.getLogger(ScheduledSpecialDomainObjectIndexCreationTask.class);
@Autowired
private IndexOperator specialDomainObjectIndexOperator;
@Scheduled(fixedDelay = 3600 * 1000)
@Override
public void createIndex() {
Date indexCreationStartDate = new Date();
try {
logger.info("Updating complete special domain object index...");
specialDomainObjectIndexOperator.createIndex();
if (logger.isDebugEnabled()) {
Date indexCreationEndDate = new Date();
logger.debug("Index creation duration: {} ms", indexCreationEndDate.getTime() - indexCreationStartDate.getTime());
}
} catch (IOException e) {
logger.error("Could update complete special domain object index.", e);
}
}
}
createIndex() is implemented as follows:
@Override
public void createIndex() throws IOException {
logger.trace("Preparing for index generation...");
IndexWriter indexWriter = getIndexWriter();
Date start = new Date();
logger.trace("Deleting all documents from index...");
indexWriter.deleteAll();
logger.trace("Starting index generation...");
long numberOfProcessedObjects = fillIndex();
logger.debug("Index written in " + (new Date().getTime() - start.getTime()) + " milliseconds.");
logger.debug("Number of processed objects: {}", numberOfProcessedObjects);
logger.debug("Number of documents in index: {}", indexWriter.numDocs());
indexWriter.commit();
indexWriter.forceMerge(1);
}
@Override
protected long fillIndex() throws IOException {
Page<SpecialDomainObject> specialDomainObjectsPage = specialDomainObjectRepository.findAll(new PageRequest(0, MAXIMUM_PAGE_ELEMENTS));
while (true) {
addToDirectory(specialDomainObjectsPage);
if (specialDomainObjectsPage.hasNextPage()) {
specialDomainObjectsPage =
specialDomainObjectRepository.findAll(new PageRequest(specialDomainObjectsPage.getNumber() + 1, specialDomainObjectsPage.getSize()));
} else {
break;
}
}
return specialDomainObjectsPage.getTotalElements();
}
There are about 2000 specialDomainObject instances and about 80 aren't written to the index (we checked this with Luke).
Is there anything that could cause the missing documents?
IndexWriter
gracefully, isn't it? – Martín Schonaker