You're probably going to want to read up a little about how analysis works.
Also take a look at this description of phrase matching. The terms in the phrase don't have to appear in the exact sequence of your query, the first one just has to appear before the second one. Since there is a "hello"
that comes after "world"
, the document matches your query.
Also note that the standard analyzer is used here, both in indexing the document and in analyzing the query, since no other analyzers were specified. You can customize this behavior if you wish.
As a quick example, I created a trivial index:
PUT /test_index
then indexed your document (with newlines escaped):
PUT /test_index/doc/1
{
"doc_text": "Hello World and \n\nbmw Master World\n\nHello"
}
then indexed another one with the last "Hello" removed:
PUT /test_index/doc/2
{
"doc_text": "Hello World and \n\nbmw Master World"
}
Now if I run your query, only the first document is returned:
POST /test_index/_search
{
"query": {
"match_phrase": {
"doc_text": "World Hello"
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.4459011,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.4459011,
"_source": {
"doc_text": "Hello World and \n\nbmw Master World\n\nHello"
}
}
]
}
}
You can prove to yourself why this happens using term vectors. I won't go into it here, but here's some code you can use to investigate if you want to:
http://sense.qbox.io/gist/3ee955b8389d1b36ea56788654955c519e2bb429