0
votes

Background

I am working on an API that allows the user to pass in a list of details about a member (name, email addresses, ...) I want to use this information to match up with account records in my Elasticsearch database and return a list of potential matches.

I thought this would be as simple as doing a bool query on the fields I want, however I seem to be getting no hits.

I'm relatively new to Elasticsearch, my current _search request looks like this.

Example Query

POST /member/account/_search

{
    "query" : {
        "filtered" : {
            "filter" : {
                "bool" : {
                    "should" [{
                        "term" : {
                             "email": "[email protected]"
                        }
                    },{
                        "term" : {
                             "email": "[email protected]"
                        }
                    },{
                        "term" : {
                             "email": "[email protected]"
                        }
                    }]
                }
            }
        }
    }
}

Question

How should I update this query to return records that match any of the email addresses?

Am I able to prioritise records that match email and another field? Example "family_name".

Will this be a problem if I need to do this against a few hundred emails addresses?

1

1 Answers

1
votes

Well , you need to make the change in the index side rather than query side.

By default your email ID is broken into [email protected] => [ jon , smith , gmail , com]

While indexing.

Now when you are searching using term query , it does not apply the analyzer and it tries to get the exact match of [email protected] , which as you can see , wont work. Even if you use match query , then you will end up getting all document as matches. Hence you need to change the mapping to index email ID as a single token , rather than tokenizing it. So using not_analyzed would be the best solution here. When you define email field as not_analyzed , the following happens while indexing. [email protected] => [ [email protected]]

After changing the mapping and indexing all your documents , now you can freely run the above query.

I would suggest to use terms query as following -

{
  "query": {
    "terms": {
      "email": [
        "[email protected]",
        "[email protected]",
        "[email protected]"
      ]
    }
  }
}

To answer the second part of your question - You are looking for boosting and would recommend to go through function score query