0
votes

I am using Lucene.net to search in about 50K entities. these entities are saved in database. I have created an application witch is trying to index 100 entities in each time.

the code is very simple:

var entityList = GetEntityList(100);

foreach (var item in entityList) 
    Indexer.IndexEntity(item);

And This is the Indexer Class :

public class Indexer {
    public void IndexEntity(Entity item)
    {
        IndexWriter writer;
        string path = ConfigurationManager.AppSettings["SearchIndexPath"];
        FSDirectory directory = FSDirectory.Open(new DirectoryInfo(path));
        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);
        if (Directory.GetFiles(path).Length > 0)
            writer = new IndexWriter(directory, analyzer, false, IndexWriter.MaxFieldLength.UNLIMITED);
        else
            writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
        Document document = new Document();
        document.Add(new Field("id", item.Id.ToString(), Field.Store.YES, Field.Index.NO, Field.TermVector.NO));
        document.Add(new Field("category", item.Category.Id.ToString(), Field.Store.YES, Field.Index.NO, Field.TermVector.NO));
        document.Add(new Field("location", item.Location.Id.ToString(), Field.Store.YES, Field.Index.NO, Field.TermVector.NO));
        document.Add(new Field("point", item.Point.ToString(), Field.Store.YES, Field.Index.NO, Field.TermVector.NO));
        document.Add(new Field("picture", item.PictureUrl, Field.Store.YES, Field.Index.NO, Field.TermVector.NO));
        document.Add(new Field("creationdate", item.CreationDate.ToString(), Field.Store.YES, Field.Index.NO, Field.TermVector.NO));
        document.Add(new Field("title", item.Title, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES));
        document.Add(new Field("body", item.Body, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES));
        string str2 = string.Empty;
        foreach (Tag tag in item.Tags)
        {
            if (!string.IsNullOrEmpty(str2))
            {
                str2 = str2 + "-";
            }
            str2 = str2 + tag.DisplayName;
        }
        document.Add(new Field("tags", str2, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES));
        writer.AddDocument(document);
        writer.Optimize();
        writer.Close();
    }
}

Every thing was fine and my search speed is now good enough. but the problem is that the indexing speed is decreased. my application is indexed about 15K entities up to now and index files size is about 600MB. Now when it wants to index 100 new entities, it takes long about 24 minutes!

what is the problem? thanks in advance.

1

1 Answers

5
votes

Two things stand out pretty clearly in your code:

  1. You're optimising the index after adding every document. With recent versions of Lucene there are very good reasons why you shouldn't optimise your index at all (per segment caching) and notwithstanding these reasons, optimising your index after adding every document is wild overkill
  2. You're continually opening/closing/committing your index. Given your looping construct, why not open the index writer outside your loop, add the entities, then close/commit. If you need quicker index visibility, you could add a periodic commit command into the loop (based on some kind of modulus arithmetic sounds OK to me.

With these two changes, I think you'll see dramatic speed ups in your indexing jobs.