I use lucene.net to index the documents. My main aim was to get to search and have the line number and the line of text returned in a document.
Here's the code that indexes
using (TextReader contentsReader = new StreamReader(fi.FullName))
{
doc.Add(new StringField("FullFileName", fi.FullName, Field.Store.YES));
doc.Add(new StringField("LastModifiedDate", modDate, Field.Store.YES));
//doc.Add(new TextField("Contents", contentsReader.ReadToEnd(), Field.Store.YES));
int lineCount = 1;
string line = String.Empty;
while ((line = contentsReader.ReadLine()) != null)
{
doc.Add(new Int32Field("LineNo", lineCount, Field.Store.YES));
doc.Add(new TextField("Contents", line, Field.Store.YES));
lineCount++;
}
Console.ForegroundColor = ConsoleColor.Blue;
Console.WriteLine("adding " + fi.Name);
Console.ResetColor();
writer.AddDocument(doc);
}
As you can see I add the filename, modified date, then I loop through all the lines in the file and add a TextField
for each line.
This is how I search:
Lucene.Net.Analysis.Analyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.LuceneVersion.LUCENE_48);
QueryParser parser = new QueryParser(Lucene.Net.Util.LuceneVersion.LUCENE_48, "Contents", analyzer);
Lucene.Net.Search.Query query = parser.Parse(searchString);
Lucene.Net.Store.Directory directory = Lucene.Net.Store.FSDirectory.Open(new System.IO.DirectoryInfo(indexDir));
Lucene.Net.Search.IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(Lucene.Net.Index.DirectoryReader.Open(directory));
TopScoreDocCollector collector = TopScoreDocCollector.Create(100, true);
searcher.Search(query, collector);
ScoreDoc[] hits1 = collector.GetTopDocs().ScoreDocs;
for (int i = 0; i < hits1.Length; i++)
{
int docId = hits1[i].Doc;
float score = hits1[i].Score;
Lucene.Net.Documents.Document doc = searcher.Doc(docId);
string result = "FileName: " + doc.Get("FullFileName") + "\n"+
" Line No: " + doc.Get("LineNo") + "\n"+
" Contents: " + doc.Get("Contents");
}
Yet. My search results return 0 hits whereas if I simply comment out that while
loop and uncomment the commented line above I get the results.
What could be the problem?
query
variable? How big are your files? In MB and line counts? – Peska