0
votes

I have a specific app which requires that the number of files that make up an index to be as few as possible. Previously when I used Lucene.NET 2.9.2 I was able to keep the entire index in 3 (or 4) files by using:

writer.SetUseCompoundFile(true);
writer.Optimize(1, true);

After upgrading to Lucene.NET 2.9.4 the same code produces index consisting of 10 files (fdt, fdx, fnm, frq, nrm, prx, tii, tis + segments.gen and segments_c). How can I bring that down again?

The cause for this is probably deep in Lucene and not that much Lucene.NET specific. Still something changed in between versions and I'd love to have control over this.

2
do you have a CFS file in your index directory? I just tested and Compound files seem to work fine with the 2.9.4g version...Jf Beaulac
No, I do not. I saw some web references towards this CFS file but I do not have it. I wonder if my version is too low? This is the current recommended stable release. I have just checked and the exact version is 2.4.9.1.wpfwannabe
I just tested using the 2.9.4.1 version from Nuget and it works fine. Are you sure you are either calling commit() or correctly closing the Writer after your call to SetUseCompoundFile(true) ?Jf Beaulac
At which point does SetUseCompoundFile(true) need to be called? I am calling it after I create the writer. Then when I want to close the writer I do writer.Optimize(1, true) and immediately after that writer.Commit().wpfwannabe
it just need to be called before you call commit, personally I set it immediately after I create an IndexWriter. Note that it only applies to newly emitted segments thoJf Beaulac

2 Answers

4
votes

OK, I've finally found an answer. When inspecting the index directory during the lengthy indexing process I have observed that CFS comes and goes but once the process is done, there is no sign of a CFS file. I did some more research given some new keywords (thanks @jf-beaulac) and I've found this. They say that the default threshold for CFS is 10% of the entire index size. If any segment grows past that, no CFS is created regardless of writer.SetUseCompoundFile(true) usage.

So, after some digging through Lucene.NET I have come up with the following necessary step:

        indexWriter.SetUseCompoundFile(true);
        var mergePolicy = indexWriter.GetMergePolicy();
        var logPolicy = mergePolicy as LogMergePolicy;
        if (logPolicy != null)
        {
            logPolicy.SetNoCFSRatio(1);
        }

Setting the "no-cfs-ratio" to 100% keeps all segments within CFS and things finally work the way I want them to.

So, @jf-beaulac thanks a lot for getting me going. I suppose your sample would fail too if you added some more documents. Still, I recognize your help and so I will accept your answer.

2
votes

I'll post the exact code snippet I used to test this, comparing it to your code will maybe help you finding whats wrong.

FSDirectory dir = FSDirectory.GetDirectory("C:\\temp\\CFSTEST");
IndexWriter writer = new IndexWriter(dir, new CJKAnalyzer());
writer.SetUseCompoundFile(true);

Document document = new Document();

document.Add(new Field(
    "text",
    "プーケット",
    Field.Store.YES,
    Field.Index.ANALYZED));
writer.AddDocument(document);

document.GetField("text").SetValue("another doc");
writer.AddDocument(document);

writer.Optimize(1, true);
writer.Close();