I am brand new to ElasticSearch, and am currently exploring its features. One of them I am interested in is the Fuzzy Query, which I am testing and having troubles to use. It is probably a dummy question so I guess someone who already used this feature will quickly find the answer, at least I hope. :)
BTW I have the feeling that it might not be only related to ElasticSearch but maybe directly to Lucene.
Let's start with a new index named "first index" in which I store an object "label" with value "american football". This is the query I use.
bash-3.2$ curl -XPOST 'http://localhost:9200/firstindex/node/?pretty=true' -d '{
"node" : {
"label" : "american football"
}
}
'
This is the result I get.
{
"ok" : true,
"_index" : "firstindex",
"_type" : "node",
"_id" : "6TXNrLSESYepXPpFWjpl1A",
"_version" : 1
}
So far so good, now I want to find this entry using a fuzzy query. This is the one I send:
bash-3.2$ curl -XGET 'http://localhost:9200/firstindex/node/_search?pretty=true' -d '{
"query" : {
"fuzzy" : {
"label" : {
"value" : "american football",
"boost" : 1.0,
"min_similarity" : 0.0,
"prefix_length" : 0
}
}
}
}
'
And this is the result I get
{
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
As you can see, no hit. But now, when I shrink a bit my query's value from "american football" to "american footb" like this:
bash-3.2$ curl -XGET 'http://localhost:9200/firstindex/node/_search?pretty=true' -d ' {
"query" : {
"fuzzy" : {
"label" : {
"value" : "american footb",
"boost" : 1.0,
"min_similarity" : 0.0,
"prefix_length" : 0
}
}
}
}
'
Then I get a correct hit on my entry, thus the result is:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.19178301,
"hits" : [ {
"_index" : "firstindex",
"_type" : "node",
"_id" : "6TXNrLSESYepXPpFWjpl1A",
"_score" : 0.19178301, "_source" : {
"node" : {
"label" : "american football"
}
}
} ]
}
}
So, I have several questions related to this test:
Why I didn't get any result when performing a query with a value completely equals the my only entry "american football"
Is it related to the fact that I have a multi-words value?
Is there a way to get the "similarity" score in my query result so I can understand better how to find the right threshold for my fuzzy queries
There is a page dedicated to Fuzzy Query on ElasticSearch web site, but I am not sure it lists all the potential parameters I can use for the fuzzy query. Were could I find such an exhaustive list?
Same question for the other queries actually.
is there a difference between a Fuzzy Query and a Query String Query using lucene syntax to get fuzzy matching?