Elasticsearch: query against _all works but not query against specific field

Question

When I query my data for _all fields, Elasticsearch returns two documents (both with only one field in the document). But when I do the same query except I change the field being queried from _all to the name of one of the fields in the returned documents, Elasticsearch returns nothing. This seems to occur with a query_string query as well as the match query shown here. Any ideas what this is occurring and how to fix it?

This is the mapping

analyzertestpatternsemi: {
  mappings: {
    content: {
      properties: {
        field: {
          type: string
          store: true
          term_vector: with_positions_offsets
          index_analyzer: analyzer_name
        }
        field2: {
          type: string
          store: true
          index_analyzer: analyzer_name
        }
      }
    }
  }
}

This is the settings

{
  analyzertestpatternsemi: {
    settings: {
      index: {
        uuid: _W55phRKQ1GylWU5JleArg
          analysis: {
            analyzer: { 
              whitespace: {
                type: custom
                fields: [
                  lowercase
                ]
                tokenizer: whitespace
              }
              analyzer_name: {
                preserve_original: true
                type: pattern
                pattern: ;
              }
            }
          }
          number_of_replicas: 1
          number_of_shards: 5
          version: {
          created: 1030299
          }
        }
      }
    }
  }

The Docs

{
  _index: analyzertestpatternsemi
  _type: content
  _id: 3
  _version: 1
  found: true
   _source: {
    field2: Hello, I am Paul; George
  }
}

and

{
  _index: analyzertestpatternsemi
  _type: content
  _id: 2
  _version: 1
  found: true
    _source: {
      field: Hello, I am Paul; George
  }
}

Getting the term vectors for _id gives

george and hello, i am paul

The "_all" query

curl -XGET http://localhost:9200/analyzertestpatternsemi/_search?
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "_all": {
              "query": "george",
              "type": "phrase"
            }
          }
        }
      ]
    }
  }
}

The "all" query results

{
  took: 2
  timed_out: false
  _shards: {
    total: 2
    successful: 2
    failed: 0
  }
  hits: {
    total: 2
    max_score: 0.4375
    hits: [
      {
        _index: analyzertestpatternsemi
        _type: content
        _id: 2
        _score: 0.4375
        _source: {
          field: Hello, I am Paul; George
        }
      }
      {
        _index: analyzertestpatternsemi
        _type: content
        _id: 3
        _score: 0.13424811
        _source: {
          field2: Hello, I am Paul; George
        }
      }
    ]
  }
}

*** Same query but searching in field: "field"

curl -XGET http://localhost:9200/analyzertestpatternsemi/_search?
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "field": {
              "query": "george",
              "type": "phrase"
            }
          }
        }
      ]
    }
  }
}

"field" query Results

{
  took: 0
  timed_out: false
  _shards: {
    total: 5
    successful: 5
    failed: 0
  }
  hits: {
    total: 0
    max_score: null
      hits: [ ]
  }
}

Same query but searching in field: "field2"

curl -XGET http://localhost:9200/analyzertestpatternsemi/_search?
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "field2": {
              "query": "george",
              "type": "phrase"
            }
          }
        }
      ]
    }
  }
}

"field2" query Results

{
  took: 0
  timed_out: false
  _shards: {
    total: 5
    successful: 5
    failed: 0
  }
  hits: {
    total: 0
    max_score: null
      hits: [ ]
  }
}

Andrei Stefan Andrei Stefan · Accepted Answer · 2014-11-05T09:43:53

The issue is your "pattern" tokenizer splits the text into hello, i am paul and george (notice the whitespace before "george"). To be able to match for george you need to get rid of that whitespace.

Here's one approach - define your own custom analyzer with a pattern tokenizer and a custom list of filters (where "trim" is the needed addition for trimming the whitespaces before and after the tokens):

{
  "mappings": {
    "content": {
      "properties": {
        "field": {
          "type": "string",
          "store": true,
          "term_vector": "with_positions_offsets",
          "index_analyzer": "analyzer_name"
        },
        "field2": {
          "type": "string",
          "store": true,
          "index_analyzer": "analyzer_name"
        }
      }
    }
  },
  "settings": {
    "index": {
      "uuid": "_W55phRKQ1GylWU5JleArg",
      "analysis": {
        "analyzer": {
          "whitespace": {
            "type": "custom",
            "fields": [
              "lowercase"
            ],
            "tokenizer": "whitespace"
          },
          "analyzer_name": {
            "type": "custom",
            "tokenizer": "my_pattern_tokenizer",
            "filter": ["lowercase","trim"]
          }
        },
        "tokenizer": {
          "my_pattern_tokenizer": {
            "type": "pattern",
            "pattern": ";"
          }
        }
      },
      "number_of_replicas": 1,
      "number_of_shards": 5,
      "version": {
        "created": "1030299"
      }
    }
  }
}

Elasticsearch: query against _all works but not query against specific field

3 Answers