2
votes

I want to search a Lucene.net index for a stored url field. My code is given below:

Field urlField = new Field("Url", url.ToLower(), Field.Store.YES,Field.Index.TOKENIZED);
document.Add(urlField);`
indexWriter.AddDocument(document);

I am using the above code for writing into the index.

And the below code to search the Url in the index.

Lucene.Net.Store.Directory _directory = FSDirectory.GetDirectory(Host, false);
IndexReader reader = IndexReader.Open(_directory);
KeywordAnalyzer _analyzer = new KeywordAnalyzer();
IndexSearcher indexSearcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("Url", _analyzer);
Query query = parser.Parse("\"" + downloadDoc.Uri.ToString() + "\"");
TopDocs hits = indexSearcher.Search(query, null, 10);
if (hits.totalHits > 0)
{
    //statements....
}

But whenever I search for a url for example: http://www.xyz.com/, I am not getting any hits.

Somehow, figured out the alternative. But this works in case of only one document in the index. If there are more documents, the below code will not yield correct result. Any ideas? Pls help

While writing the index, use KeywordAnalyzer()

KeywordAnalyzer _analyzer = new KeywordAnalyzer();    
indexWriter = new IndexWriter(_directory, _analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);

Then while searching also, use KeywordAnalyzer()

IndexReader reader = IndexReader.Open(_directory);
KeywordAnalyzer _analyzer = new KeywordAnalyzer();
IndexSearcher indexSearcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("Url", _analyzer);
Query query = parser.Parse("\"" + url.ToString() + "\"");                    
TopDocs hits = indexSearcher.Search(query, null, 1);

This is because the KeywordAnalyzer "Tokenizes" the entire stream as a single token.

Please help. Its urgent.

Cheers Sunil...

3
Lucene matches indexed values, not stored values. How are you indexing the field?sisve
I want to search if a url is already indexed. If yes, I am replacing the stored "Content" of it. The url could be of any format: http://www.xyz.com/ or http://www.xyz.com/page1/ etc.. _analyzer is StandardAnalyzerSunil Raj
Edited my question. Please check now.Sunil Raj
Try using the Standard Analyzer first. Get rid of the backslashes in the query...SharpBarb
I have already tried that. Thanks.Sunil Raj

3 Answers

1
votes

This worked for me:

 IndexReader reader = IndexReader.Open(_directory);                
 IndexSearcher indexSearcher = new IndexSearcher(reader);
 TermQuery tq= new TermQuery(new Term("Url", downloadDoc.Uri.ToString().ToLower()));                
 BooleanQuery bq = new BooleanQuery();
 bq.Add(tq, BooleanClause.Occur.SHOULD);
 TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);

Use StandardAnalyzer while writing into the index.

This answer helped me: Lucene search by URL

0
votes

try putting quotes around query, eg. like this :

"http://www.google.com/"

0
votes

Using the whitespace or keyword analyzer should work.

Would anyone actually search for "http://www.Google.com"? Seems more likely that a user would search for "Google" instead.

You can always return the entire URL if their is a partial match. I think the standard analyzer should be more appropriate for searching and retrieving a URL.