There are multiple problems to consider here.
- Do not use query_string, unless you know exactly what you are doing. Pay special attention if the input is coming from the user. Prefer to use simple_query_string instead.
- I doubt that you want the name to be of type
keyword
. This type means that the string will not be analyzed (lowercased, tokenized etc). So if you search with something other than the exact same input then it won't match. e.g. Doug Small
. You would think that since you search with the exact same input, at least this document would return, but that's not the case. The reason is that query_string
or simple_query_string
input is parsed (and as a consequence tokenized). If you don't specify your input as one term then it won't match. In order to do that you need to wrap your term with double quotes ("\"Doug Small\""). But if you do this, you will lose all other matches.
- I believe what you need is the name and type to be of type
text
. This means that the saved string will be analyzed (tokenized, lowercased etc, check simple analyzer (which is the default if you don't specify another).
- You have operator specified as
AND
for query_string
. This means that all of the query terms must match on either name or type. But you are stating that you need to have all documents returned with your query. Only one document has both Doug
and Small
. If you need this then that operator must change to OR
(which is the default).
A complete example
PUT test
{
"mappings": {
"properties": {
"uid": {
"type": "keyword"
},
"name": {
"type": "text"
},
"type": {
"type": "text"
}
}
}
}
POST test/_bulk
{ "index" : { "_id" : "1" } }
{ "name": "Doug", "type": "Large"}
{ "index" : { "_id" : "2" } }
{ "name": "Doug Small", "type":"Large"}
{ "index" : { "_id" : "3" } }
{ "name": "Smal", "type": "Medium"}
{ "index" : { "_id" : "4" } }
{ "name": "Peter", "type": "Small"}
GET test/_search
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"fields": [
"name",
"type"
],
"query": "*Doug Small*",
"default_operator": "OR"
}
}
]
}
}
}
The above query now returns all three documents that have Doug
or Small
or both. Moreover, is case insensitive (since it's now analyzed) so this *doug small*
will yield the same 3 results.
Since now the fields are analyzed you don't need to use the wildcard symbol, because it is now for the first token and the last. Meaning
*Doug Small*
: Match anything that has <ANYTHING>Dog
OR Small<Anything>
*Doug Smith Small*
: Match anything that has <ANYTHING>Dog
OR Smith
OR Small<Anything>
(OR -> default operator, if you keep AND then it changes accordingly)
So let's remove the wildcard as well
GET test/_search
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"fields": [
"name",
"type"
],
"query": "Doug Small",
"default_operator": "OR"
}
}
]
}
}
}
This yields the exact same 3 results. You are still missing Smal
. Now you need to add fuzzy matching in order to include that as well.
GET test/_search
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"fields": [
"name",
"type"
],
"query": "Doug Small~",
"default_operator": "OR"
}
}
]
}
}
}
This Doug Small~
means bring everything that has Doug
OR Small
where Small
can be a NOT exact match.
You can have fuzzy matching for all your terms
GET test/_search
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"fields": [
"name",
"type"
],
"query": "Dg~ Small~",
"default_operator": "OR"
}
}
]
}
}
}
The reason why Dg
matches with Doug
is because of the fuziness level https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#fuzziness
The maximum allowed Levenshtein Edit Distance (or number of edits)