I thought the scenario must be quite common, but I was unable to find any clues on how to progress.
I've got an elasticsearch index that contains single type Order
. In turn, Order
contains Customer
information, such as firstName
, lastName
, middleName
(and their concatenation fullName
), e.g.
"order": {
// other stuff
"customer": {
"firstName": ...,
"lastName": ...,
"middleName": ...,
"fullName": "FirstName MiddleName LastName"
}
}
The aim is to provide order search functionality, including search by customers' names. The input to the elastic would always be a single query string, containing whatever user typed into a search box. The problem is there are some dirty data (e.g. missing firstName, swapped first and last name, etc.) and I can't rely on users always input names in certain order.
I've tried achieving that with query_string
query like this:
"query_string": {
"query": "[User Input]*", // note asterisk here
"fields" : ["customer.firstName", "customer.lastName", "customer.middleName"],
"analyzer": "whitespace",
"use_dis_max": true,
"tie_breaker": 0.7,
"analyze_wildcard": true
}
It does a decent work of finding the results in some cases, but it's definitely not robust against dirty data, e.g. it finds John Doe's order if searching for "John Do" (not a typo), but fails to do so if searching for "Doe John".
Desired query behavior would be something like match_phrase_prefix
on multiple fields with whitespace
analyzer to process query and do a prefixing on each term coming out from analyzer. As an example, John Doe
would be turned into something like ["John*", "Doe*"]
and each applied to
firstName
,lastName
,middleName
fields- or to
fullName
allowing individual terms to come in any order
I'm really new to elastic, so I might be missing something really simple, or not confident enough to write really complex queries.
Edit: index mappings: http://pastebin.com/fuLLgHjB. Target fields are not yet analyzed because: (1) I'm not the one made an initial setup and (2) I'm really not sure what field analyzers should I set up, so it's a part of the question.