2
votes

I have a property edApp.name I query with match. I have confirmed that the mapping for is "type": "string" so it should be analyzed.

When I query with match, I get a different number of hits each time.

I see the same behaviour whether querying all documents with /_search or a subset through a read alias.

Newer update: A dynamically mapped field seems to be the culprit. The field is generated.edApp.name and it gets dynamically mapped with "not_analyzed". As soon as a document with this field is indexed, the analyzer for edApp.name breaks and I start seeing the weird results with match queries.


document:

{
  @context: "http://purl.imsglobal.org/ctx/caliper/v1/Context",
  edApp: {
    name: "ReadingRainbow"
  }
}

mapping:

"dynamic_templates": [
          {
            "string_theory": {
              "mapping": {
                "index": "not_analyzed",
                "type": "string",
                "doc_values": true
              },
              "match": "*",
              "match_mapping_type": "string"
            }
          },
          {
            "i_dont_know_you": {
              "mapping": {
                "enabled": false
              },
              "match_mapping_type": "object",
              "path_match": "*.extensions.*"
            }
          }
   ],
   "properties": {
        "_all": {
          "enabled": false
        },
        "_timestamp": {
          "enabled": true
        },
...
      "edApp": {
        "properties": {
           "name": {
              "type": "string"
           }
        }
     }
}

query returning inconsistent results:

{
  "query": {
      "match": {
          "edApp.name": "ReadingRainbow"
      }
   }
}

hits.total values when running query multiple times: [44, 56, 57, 69]

term query returning inconsistent results:

{
    "query": {
        "bool": {
            "should": [
            {
                "term": {
                    "edApp.name": "ReadingWonders2.0"
                }
            }
            ]
        }
    }
}

hits.total values when running term query multiple times: [21, 33, 34, 46]

Other term query returning inconsistent results (note lower case):

{
    "query": {
        "bool": {
            "should": [
            {
                "term": {
                    "edApp.name": "readingwonders2.0"
                }
            }
            ]
        }
    }
}

hits.total values when running term query multiple times: [44, 56, 57, 69] NOTE: these are the same counts we saw with the match query!

query with both terms:

{
    "query": {
        "bool": {
            "should": [
            {
                "term": {
                    "edApp.name": "readingwonders2.0"
                }
            },
            {
                "term": {
                    "edApp.name": "ReadingWonders2.0"
                }
            }
            ]
        }
    }
}

hits.total values are consistent: 79 results

As you can see, the inconsistent hits from lowercase, and camelcase term searches add up to 79 documents. Could the analyzer be creating this inconsistency?

I am using AWS Elasticsearch Service ES 1.5.2

1
You should probably show both of your queries and a sample document that you think should match - Val
updated with examples - smashbourne
Post the complete mapping of that index, not only one field. - Andrei Stefan
Complete mapping is 4000 lines. Updated to include dynamic templates I am using. But I dont think they affect anything since the property is explicitly mapped. - smashbourne
Are you using any preference in the query or routing? - Andrei Stefan

1 Answers

1
votes

A dynamically mapped property called generated.edApp.name was conflicting with the edApp.name.

edApp.name was explicitly mapped as "analyzed"

generated.edApp.name was dynamically mapped as "not_analyzed"

Once the dynamically property exists, the match query for edApp.name breaks.

My workaround is adding a dynamic template to handle my fields that share names with my explicitly mapped analyzed strings

"dynamic_templates": [
          {
            "analyzed_string_theory": {
              "mapping": {
                "index": "analyzed",
                "type": "string"
              },
              "match_pattern": "regex",
              "match": "^.*name|.*keywords|.*description$"
            }
          },
      {
        "string_theory": {
          "mapping": {
            "index": "not_analyzed",
            "type": "string"
          },
          "match_mapping_type": "string",
          "match": "*"
        }
      },
     ... 
]