0
votes

I am trying to use the AzureDirectory library to store a Lucene.NET index on a Azure Cloud Storage account.

I am using the following versions:

  • Microsoft.Windows.Azure.Storage 4.3.0.0
  • Lucene.Net 3.0.3.0
  • Lucene.Net.Store.Azure 3.0.5553.21100

And calling the following method:

public void UpdateDocument(Term keyTerm, Document document, string indexName)
{    
    using (var analyser = new StandardAnalyzer(LuceneVersion))
    {
        using (var directory = new AzureDirectory(cloudStorage.GetStorageAccount(), indexName, new RAMDirectory()))
        {
            using (var indexWriter = new IndexWriter(directory, analyser, true, IndexWriter.MaxFieldLength.UNLIMITED))
            {
                indexWriter.UpdateDocument(keyTerm, document);
            }
        }
    }
}

When I call the method even as little as 10 times (from a unit test) the overall time is around 30 seconds.

I have tried various changes with the index writer to see if any performance gains can be made but so far nothing. I have tried changing the code to reuse the index writer and directory classes but I end up with file locks. I also wanted to keep the index code abstracted away from the caller to keep Lucene isolated. If I comment out indexWriter.UpdateDocument(keyTerm, document); then its responsive which tells me this is where the slowness is.

I would like to know if I am doing something wrong or missing something here?

1

1 Answers

0
votes

The method above just needed to be adjusted to work better with the resources as opening the directory and index writer for each document was too costly. My adjusted method works fine:

public void UpdateDocumentBatch(Term keyTerm, IEnumerable<Document> documents, string indexName)
{
    using (var analyser = new StandardAnalyzer(LuceneVersion))
    {
        using (var directory = new AzureDirectory(cloudStorage.GetStorageAccount(), indexName, new RAMDirectory()))
        {
            var createIndex = !IndexReader.IndexExists(directory);

            using (var indexWriter = new IndexWriter(directory, analyser, createIndex, IndexWriter.MaxFieldLength.UNLIMITED))
            {
                indexWriter.SetRAMBufferSizeMB(100);

                foreach (var document in documents)
                {
                    keyTerm.Text = document.GetField(keyTerm.Field).StringValue;

                    indexWriter.UpdateDocument(keyTerm, document);
                }

                indexWriter.Commit();
            }
        }
    }
}