3
votes

I'm looking to build an auto-complete textbox over a large quantity of city names. Search functionality is as follows: I want a "Starts with" search over a multi-word phrase. For example, if user has typed in "chicago he", only locations such as "Chicago Heights" need to be returned.
I'm trying to use Lucene for this. I'm having issues understanding how this needs to be implemented.

I've tried what I think is the approach that should work:

I've indexed locations with KeywordAnalyzer (I've tried both TOKENIZED and UN_TOKENIZED):

doc.Add(new Field("Name", data.ToLower(), Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.NO));

And search for them via the following (I've also tried a variety of other queries/analyzers/etc):

var luceneQuery = new BooleanQuery();
var wildcardQuery = new WildcardQuery(new Term("Name", "chicago hei*"));
luceneQuery.Add(wildcardQuery, BooleanClause.Occur.MUST);

I'm not getting any results. Would appreciate any advice.

2

2 Answers

3
votes

To do that you need to index your field with the Field.Index.NOT_ANALYZED setting, which is the same as the UN_TOKENIZED you use, so it should work. Heres a working sample I quickly made up to test. Im using the latest version available on Nuget

IndexWriter iw = new IndexWriter(@"C:\temp\sotests", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29), true);

Document doc = new Document();
Field loc = new Field("location", "", Field.Store.YES, Field.Index.NOT_ANALYZED);
doc.Add(loc);

loc.SetValue("chicago heights");
iw.AddDocument(doc);

loc.SetValue("new-york");
iw.AddDocument(doc);

loc.SetValue("chicago low");
iw.AddDocument(doc);

loc.SetValue("montreal");
iw.AddDocument(doc);

loc.SetValue("paris");
iw.AddDocument(doc);

iw.Commit();


IndexSearcher ins = new IndexSearcher(iw.GetReader());

WildcardQuery query = new WildcardQuery(new Term("location", "chicago he*"));

var hits = ins.Search(query);

for (int i = 0; i < hits.Length(); i++)
    Console.WriteLine(hits.Doc(i).GetField("location").StringValue());

Console.WriteLine("---");

query = new WildcardQuery(new Term("location", "chic*"));
hits = ins.Search(query);

for (int i = 0; i < hits.Length(); i++)
    Console.WriteLine(hits.Doc(i).GetField("location").StringValue());

iw.Close();
Console.ReadLine();
0
votes

The only way to guarantee a "starts with" search is to put a delimiter at the beginning of the indexed string, so "diamond ring" is indexed like "lucenedelimiter diamond ring lucenedelimiter". This prevents a search turning up "the famous Diamond Ridge Resort" from turning up in a search for "diamond ri*".