7
votes

I have implemented RamDirectory with StandandAnalyzer, and am storing places data in Lucene cache, I have added data in Lucene like below :

final Document document = new Document();

final IndexableField id = new StringField("placeId", place.getPlaceId(), Field.Store.YES);
final IndexableField name = new TextField("name", place.getName().toLowerCase(), Field.Store.YES);
final IndexableField location = new LatLonPoint("location", place.getLatitude(), place.getLongitude());
final IndexableField city = new StringField("city", place.getCity(), Field.Store.YES);

document.add(id);
document.add(name);
document.add(location);
document.add(city);

I've implemented two approaches to search the data, One is nearby places in defined radius, which works well, and another is to search places by name. And we have to implement autocomplete feature on search by name as well.

I've implemented search by name as follows:

QueryParser parser = new QueryParser("name", analyzer);
return parser.createPhraseQuery("name", searchStr, 2);

Now I have a place with name Lets say "Tom clinic and pharmacy".

If I search using following phrases I get the result back:

  1. Tom
  2. Tom clinic
  3. Tom pharmacy

Which is great, but if a user types "Tom clini" or "Tom pharma", Lucene gives me no results back.

I have tried to add a "*" at the end of the searchStr, tried passing the phrase to a wildcardQuery(which works fine on a single word, but fails on multiple words).

Also I would like to add fuzziness a bit so typos can be handled, I'm new to Lucene and not sure what to do from here, so help me out anyhow you can!

P.S Its Lucene 7.3

2
what analyzer are you using ?root
StandardAnalyzerIbraheem Faiq
Is you use case to do prefix searches? or e.g in you case if a you just search for pharmacy, do you still want to match the document which has "Tom clinic and pharmacy" ?root

2 Answers

0
votes

The best thing to do in these cases is always to look into good resources. I can advise the following Books

. In particular you are probably interested in one of the following or both:

Fuzzy query

Lucene's fuzzy search implementation is based on Levenshtein distance. It compares two strings and finds out the number of single character changes that are needed to transform one string to another. The resulting number indicates the closeness of the two strings. In a fuzzy search, a threshold number of edits is used to determine if the two strings are matched. To trigger a fuzzy match in QueryParser, you can use the tilde ~ character. There are a couple configurations in QueryParser to tune this type of query. Here is a code

queryParser.setFuzzyMinSim(2f);
queryParser.setFuzzyPrefixLength(3);
Query query = queryParser.parse("hump~");

This example will return first, second, and fourth sentences as the fuzzy match matches hump to humpty because these two words are missed by two characters. We tuned the fuzzy query to a minimum similarity to two in this example.

PhraseQuery and MultiPhraseQuery

A PhraseQuery matches a particular sequence of terms, while a MultiPhraseQuery gives you an option to match multiple terms in the same position. For example, MultiPhrasQuery supports a phrase such as humpty (dumpty OR together) in which it matches humpty in position 0 and dumpty or together in position 1.

How to do it...

Here is a code snippet to demonstrate both Query types:

PhraseQuery query = new PhraseQuery();
query.add(new Term("content", "humpty"));
query.add(new Term("content", "together"));
MultiPhraseQuery query2 = new MultiPhraseQuery();
Term[] terms1 = new Term[1];
terms1[0] = new Term("content", "humpty");
Term[] terms2 = new Term[2];
terms2[0] = new Term("content", "dumpty");
terms2[1] = new Term("content", "together");
query2.add(terms1);
query2.add(terms2);

How it works…

The first Query, PhraseQuery, searches for the phrase humpty together. The second Query, MultiPhraseQuery, searches for the phrase humpty (dumpty OR together). The first Query would return sentence four from our setup, while the second Query would return sentence one, two, and four. Note that in MultiPhraseQuery, multiple terms in the same position are added as an array.

However, there aren't many application out there that deal directly with Lucene, it's more common instead to use either Solr or Elastic Search. Both use Lucene under the hood but it's wrapped beautifully. It's probably worth having a look.

0
votes

Use Fuzzy Queries
You can use Fuzzy Queries on the fields you like to search. Note that you use TextField because these fields are going to be analyzed (and StringField will not) and are used for full text searches.

Read more here FuzzyQuery


Use SpanNear Queries
Matches spans which are near one another. One can specify slop, the maximum number of intervening unmatched positions, as well as whether matches are required to be in-order.

Read more here SpanNearQuery