1
votes

i'm not getting the expected result when using a phrase in the query_string for elasticsearch.

let's say i have a title, 'john wayne goes to manhattan'. i've indexed the title field with a 'standard' analyzer and the following is my query. with or without the fuzzy indicator (~) it won't find anything unless i have 'john wayne' spelled correctly. no results for 'john wane' or similar.

"query": {

  "query_string": {
    "fields": ["title^2"],
    "query": "\"john wayne\"~1",
    "default_operator": "AND", 
    "phrase_slop": 0, 
    "minimum_should_match": "100%"
  }
}

i've tried altering the number after the tilde to increase the fuziness, but still no matches.

any ideas?

1

1 Answers

6
votes

Doing a fuzzy search on a phrase is actually a "proximity" search. Instead of measuring the levenshtein distance between letters, the proximity between terms in the query.

Your query should return results if it were:

"query" : "john wane~1" 

See here for more info on the difference: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_fuzziness

Edit:

Here is a concrete example recreation:

Create some docs

curl -XPUT "http://localhost:9200/test/test/1" -d'
{
    "message" : "My best friend is John Wayne, who is yours?"
}'

curl -XPUT "http://localhost:9200/test/test/2" -d'
{
    "message" : "My best friend is John Marion Wayne, who is yours?"
}'

curl -XPUT "http://localhost:9200/test/test/3" -d'
{
    "message" : "My best friend is John Marion Mitchell Wayne, who is yours?"
}'

Sample naive query, non phrase:

curl -XGET "http://localhost:9200/_search" -d'
{
    "query" : {
        "query_string": {
           "query": "john AND wane~1"
        }
    }
}'

How to do the phrase query with span. Notice the terms are lower cased, as the term query is not analyzed. Also, you can adjust the span slop to control how close to each other each term should be.

curl -XGET "http://localhost:9200/_search" -d'
{
    "query" : {
        "span_near" : {
        "clauses" : [
            { "span_term" : { "message" : "john" } },
            { "span_term" : { "message" : "wayne" } }
        ],
        "slop" : 0,
        "in_order" : true
        }
    }
}'

And now here is the real deal of exactly what you are looking for.

curl -XGET "http://localhost:9200/_search" -d'
{
    "query" : {
        "span_near" : {
            "clauses" : [
                {
                    "span_multi" : {
                        "match" : {
                            "fuzzy" : {
                                "message" : {
                                    "value" : "john",
                                    "fuzziness" : "1"
                                }
                            }
                        }
                    }
                },
                {
                    "span_multi" : {
                        "match" : {
                            "fuzzy" : {
                                "message" : {
                                    "value" : "wane",
                                    "fuzziness" : "1"
                                }
                            }
                        }
                    }
                }
            ],
            "slop" : 0,
            "in_order" : true
        }
    }
}'