1
votes

Assuming I have stored a short unanalysed field in Lucence is there a way to search for documents where this field contains a specific sub-string.

For example this field value "AA-883 98/67" can be matched with the follow substrings "883", "98/67", "AA-883", "883 98" etc.

I need to combine this with other filters when querying Lucene. This is for Lucene.NET 2.9

1

1 Answers

1
votes

You could use a WildCardQuery, but if a wildcard term starts with a wildcard (* or ?) it will be extremely slow if you have lots of distinct terms in that field.

Here si a little quick example that demonstrate how to write the WildcardQuery. It uses deprecated stuff and should be modified to use non-deprecated overloads, but you should get the idea.

To combine with other queries, you could use the BooleanQuery class, which allows you to combine several queries together.

RAMDirectory dir = new RAMDirectory();
IndexWriter iw = new IndexWriter(dir, new StandardAnalyzer());

Document doc = new Document();
doc.Add(new Field("test", "AA-883 98/67", Field.Store.YES, Field.Index.NOT_ANALYZED));
iw.AddDocument(doc);
iw.Commit();

IndexSearcher searcher = new IndexSearcher(iw.GetReader());

WildcardQuery query = new WildcardQuery(new Term("test", "*883*"));
Hits hits = searcher.Search(query);
Console.WriteLine(hits.Length());
// prints 1

query = new WildcardQuery(new Term("test", "*98/67*"));
hits = searcher.Search(query);
Console.WriteLine(hits.Length());
// prints 1

query = new WildcardQuery(new Term("test", "*AA-883*"));
hits = searcher.Search(query);
Console.WriteLine(hits.Length());
// prints 1

query = new WildcardQuery(new Term("test", "*883 98*"));
hits = searcher.Search(query);
Console.WriteLine(hits.Length());
// prints 1

Console.ReadLine();
iw.Close();
dir.Close();