0
votes

How can I create Lucene index that will have only one segment (without using force merge) I have more than enough RAM so I have tried using 1.5GB buffer size for mucj smaller index of up to 64-128MB but still having 5-10 segments at the end of indexing. What can i do about it?

public static final double DEFAULT_RAM_BUFFER_SIZE_MB_STORE = 1536.;

...

final File file = new File(pathIndex);
final Path path = file.toPath();
final Directory index = ControlObjectsLuceneIndex.createDirectory(path, file);
final IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
indexWriterConfig.setRAMBufferSizeMB(defaultRamBufferSizeMb);
indexWriterConfig.setSimilarity(_ekspertSimilarity);
indexWriterConfig.setUseCompoundFile(false);
return new IndexWriter(index, indexWriterConfig);
1
why do you need 1 segment? could you show the code that doing actual adding docs? - Mysterion
I have some complicated batch searches to analyze date that works quite faster when all documents are in one segments. Code for adding docs is creating document with a lot of fields and then indes.Writer.addDocument(document); - Danilo C.
You still could force merge to 1 segment, right? - Mysterion
Yes I can. Force merge works but it is quite time expensive too. I want to use my RAM resources to buffer data and write into one segment for probably fastest process. - Danilo C.

1 Answers

1
votes

A flush is triggered when there are enough added documents since the last flush. Flushing is triggered either by RAM usage of the documents (see IndexWriterConfig.setRAMBufferSizeMB(double)) or the number of added documents (see IndexWriterConfig.setMaxBufferedDocs(int)).

This means, that if you want to prevent flushing - you need to set high limit on both those values, to ensure that both number of added documents and RAM usage will be less than your limits.

Another approach could be to pass IndexWriterConfig.DISABLE_AUTO_FLUSH in setMaxBufferedDocs or in setRAMBufferSizeMB to prevent triggering a flush due to number of buffered documents or RAM usage. Note, that however you couldn't set both values to be DISABLE_AUTO_FLUSH and most likely you should be able to figure out your number of documents easier, than amount of RAM

Also, make sure that your usage of IndexWriter is in single thread only (or properly synchronized)

Source: https://lucene.apache.org/core/7_6_0/core/org/apache/lucene/index/IndexWriter.html