2
votes

I have a property edApp.name I query with match. I have confirmed that the mapping for is "type": "string" so it should be analyzed.

When I query with match, I get a different number of hits each time.

I see the same behaviour whether querying all documents with /_search or a subset through a read alias.

Newer update: A dynamically mapped field seems to be the culprit. The field is generated.edApp.name and it gets dynamically mapped with "not_analyzed". As soon as a document with this field is indexed, the analyzer for edApp.name breaks and I start seeing the weird results with match queries.


document:

{
  @context: "http://purl.imsglobal.org/ctx/caliper/v1/Context",
  edApp: {
    name: "ReadingRainbow"
  }
}

mapping:

"dynamic_templates": [
          {
            "string_theory": {
              "mapping": {
                "index": "not_analyzed",
                "type": "string",
                "doc_values": true
              },
              "match": "*",
              "match_mapping_type": "string"
            }
          },
          {
            "i_dont_know_you": {
              "mapping": {
                "enabled": false
              },
              "match_mapping_type": "object",
              "path_match": "*.extensions.*"
            }
          }
   ],
   "properties": {
        "_all": {
          "enabled": false
        },
        "_timestamp": {
          "enabled": true
        },
...
      "edApp": {
        "properties": {
           "name": {
              "type": "string"
           }
        }
     }
}

query returning inconsistent results:

{
  "query": {
      "match": {
          "edApp.name": "ReadingRainbow"
      }
   }
}

hits.total values when running query multiple times: [44, 56, 57, 69]

term query returning inconsistent results:

{
    "query": {
        "bool": {
            "should": [
            {
                "term": {
                    "edApp.name": "ReadingWonders2.0"
                }
            }
            ]
        }
    }
}

hits.total values when running term query multiple times: [21, 33, 34, 46]

Other term query returning inconsistent results (note lower case):

{
    "query": {
        "bool": {
            "should": [
            {
                "term": {
                    "edApp.name": "readingwonders2.0"
                }
            }
            ]
        }
    }
}

hits.total values when running term query multiple times: [44, 56, 57, 69] NOTE: these are the same counts we saw with the match query!

query with both terms:

{
    "query": {
        "bool": {
            "should": [
            {
                "term": {
                    "edApp.name": "readingwonders2.0"
                }
            },
            {
                "term": {
                    "edApp.name": "ReadingWonders2.0"
                }
            }
            ]
        }
    }
}

hits.total values are consistent: 79 results

As you can see, the inconsistent hits from lowercase, and camelcase term searches add up to 79 documents. Could the analyzer be creating this inconsistency?

I am using AWS Elasticsearch Service ES 1.5.2

1
You should probably show both of your queries and a sample document that you think should matchVal
updated with examplessmashbourne
Post the complete mapping of that index, not only one field.Andrei Stefan
Complete mapping is 4000 lines. Updated to include dynamic templates I am using. But I dont think they affect anything since the property is explicitly mapped.smashbourne
Are you using any preference in the query or routing?Andrei Stefan

1 Answers

1
votes

A dynamically mapped property called generated.edApp.name was conflicting with the edApp.name.

edApp.name was explicitly mapped as "analyzed"

generated.edApp.name was dynamically mapped as "not_analyzed"

Once the dynamically property exists, the match query for edApp.name breaks.

My workaround is adding a dynamic template to handle my fields that share names with my explicitly mapped analyzed strings

"dynamic_templates": [
          {
            "analyzed_string_theory": {
              "mapping": {
                "index": "analyzed",
                "type": "string"
              },
              "match_pattern": "regex",
              "match": "^.*name|.*keywords|.*description$"
            }
          },
      {
        "string_theory": {
          "mapping": {
            "index": "not_analyzed",
            "type": "string"
          },
          "match_mapping_type": "string",
          "match": "*"
        }
      },
     ... 
]