1
votes

I use lucene.net to index the documents. My main aim was to get to search and have the line number and the line of text returned in a document.

Here's the code that indexes

using (TextReader contentsReader = new StreamReader(fi.FullName))
{
    doc.Add(new StringField("FullFileName", fi.FullName, Field.Store.YES));
    doc.Add(new StringField("LastModifiedDate", modDate, Field.Store.YES));
    //doc.Add(new TextField("Contents", contentsReader.ReadToEnd(), Field.Store.YES));

    int lineCount = 1;
    string line = String.Empty;
    while ((line = contentsReader.ReadLine()) != null)
    {
        doc.Add(new Int32Field("LineNo", lineCount, Field.Store.YES));
        doc.Add(new TextField("Contents", line, Field.Store.YES));
        lineCount++;
    }

    Console.ForegroundColor = ConsoleColor.Blue;
    Console.WriteLine("adding " + fi.Name);
    Console.ResetColor();
    writer.AddDocument(doc);
}

As you can see I add the filename, modified date, then I loop through all the lines in the file and add a TextField for each line.

This is how I search:

  Lucene.Net.Analysis.Analyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.LuceneVersion.LUCENE_48);
            QueryParser parser = new QueryParser(Lucene.Net.Util.LuceneVersion.LUCENE_48, "Contents", analyzer);
            Lucene.Net.Search.Query query = parser.Parse(searchString);

    Lucene.Net.Store.Directory directory = Lucene.Net.Store.FSDirectory.Open(new System.IO.DirectoryInfo(indexDir));
    Lucene.Net.Search.IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(Lucene.Net.Index.DirectoryReader.Open(directory));
    TopScoreDocCollector collector = TopScoreDocCollector.Create(100, true);
    searcher.Search(query, collector);
    ScoreDoc[] hits1 = collector.GetTopDocs().ScoreDocs;
    for (int i = 0; i < hits1.Length; i++)
    {
        int docId = hits1[i].Doc;
        float score = hits1[i].Score;

        Lucene.Net.Documents.Document doc = searcher.Doc(docId);

        string result = "FileName: " + doc.Get("FullFileName") + "\n"+
        " Line No: " + doc.Get("LineNo") + "\n"+
        " Contents: " + doc.Get("Contents");
    }

Yet. My search results return 0 hits whereas if I simply comment out that while loop and uncomment the commented line above I get the results.

What could be the problem?

1
I just run similar example to yours and everything works fine. How do you created your query variable? How big are your files? In MB and line counts?Peska
I edited the code. Also it about 27mb in total.Eminem
At the moment I index around 2000 text filesEminem
@eminem I highly recommend you download Luke for Lucene 4.8 and see inside your index. Compare them with using the while loop and without. The number of tokens should be identical if not look at my answer belowrojobo

1 Answers

0
votes

It's maybe because of the analyzer's reuse strategy change in Lucene 4.0+. The reuse strategy it's caching the tokens in a dictionary so for every iteration chances are the index is only storing some tokens where as passing it all at once process everything. May need to override the reuse strategy I straight up overode it to make it behave the way it was in Lucene 3.0.5. let me know if this helps