0
votes

I am using Lucene to search a contacts directly with general contact information for a database of people such as first name, last name, phone number, address etc. This question pertains specifically to searching by first and last name. Here is how I am indexing the names.

document.add(new Field("firstName", contact.getFirstName(), Field.Store.NO, Field.Index.NOT_ANALYZED));
document.add(new Field("lastName", contact.getLastName(), Field.Store.NO, Field.Index.NOT_ANALYZED));

I am searching the index like this:

IndexReader indexReader = IndexReader.open(FSDirectory.open(directory));
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
int hitsPerPage = indexSearcher.maxDoc();
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
String[] fields = {"id", "firstName", "lastName", "phoneNumber", "email", "address", "website"};

BooleanQuery booleanQuery = new BooleanQuery();
String[] terms = queryString.split(" ");

for(String term : terms) {
    for(String field : fields) {
        booleanQuery.add(new FuzzyQuery(new Term(field, term)), BooleanClause.Occur.SHOULD);
    }
}

TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
indexSearcher.search(booleanQuery, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

The reason I am using a boolean query as opposed to a MultiFieldQuery is because it allows me to get results when a field is not exact. Basically I split the querystring by whitespace and then add terms for each of those keywords on each field in the index. I'm new to Lucene so I really have no idea if this is the optimal way to do this, but so far its been working ok for me.

The only hiccup i'm having is that when searching by full name it is not returning the results in the right order.

Index has 2 records, John Doe and John Smith.

When I search for John Doe my results will look like: 1) John Smith 2) John Doe

If I type John Smith it will reverse and display John Doe first. Why is it not returning the exact match as the first result?

2
That does seem a strange result, based on what you've provided. I'd be interested to see more on how you are building the index. Is there a possibility that the wrong this is getting indexed or stored somewhere there?femtoRgon

2 Answers

0
votes

If you are going to search for all terms across all fields, why not index the entire text as part of another field? And then you can issue a query like

/*
\\\\ is for escaping "
*/
String searchCriteria = "all:\\\\"John Doe\\\\"^3 OR all:(John Doe)"; 
IndexSearcher is = new IndexSearcher(indexDirectory);
Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser("all", analyzer);
Query query = parser.parse(searchCriteria);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
indexSearcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

However, if you want to continue with your current design, you can try http://lucene.apache.org/java/3_5_0/api/all/org/apache/lucene/search/IndexSearcher.html#explain(org.apache.lucene.search.Query, int) to find out why a document is being scored higher than other.

0
votes

Using boolean queries and a for loop turned out to be a proper way of searching the index in my situation. The results were being reversed due to the way I was parsing and displaying them on the client side so it was a completely unrelated issue.