0
votes

I have an elastic document as below.

{
  "process_id" : "123",
  "user_info" : [{
     "first_name":"A",
     "last_name: "B"
   }]
}

{
  "process_id" : "123",
  "user_info" : [{
     "first_name":"C",
     "last_name: "B"
 }, 
  {"first_name" : "A", 
  "last_name":"D"
  } ]
}

Scenario 1:

I have not set the nested type to the "user_info" field. I search for "process_id" as 123 and first_name as A and last_name as B, I get both the documents in the result.

Scenario 2:

The search returns an error. It looks like I will not be able to search for the nested item and the one that is in the parent.

The query is as below:

{


 "query": {
    "query_string": {
      "query": "process_id:123",
      "nested": {
        "path": "user_info",
        "query": {
          "query_string": {
            "query": "(user_info.first_name:A AND user_info.last_name:B"
          }
        }
      }
    }   } }

The error response is as below.

{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[query_string] unknown token [START_OBJECT] after [nested]",
"line": 1
}
],
"type": "parsing_exception",
"reason": "[query_string] unknown token [START_OBJECT] after [nested]",
"line": 1,

},
"status": 400
}

The ideal response should be when I search for process_id as 123, first_name as A and last_name as B, Only the first document has to be returned.

Note: The attribute names are kept generic in purpose so that the actual can be illustrated.

1
Please add the search you're trying to do and the error it throws. - Kevin Quinzel
@KevinQuinzel Updated the query and the error response i received in the query - jagannathan rajagopalan

1 Answers

2
votes

I can see the following 2 issues in your Scenario 2

  • to combine queries you need to use the bool-query
  • in order to make nested-queries work you need to define the "parent"-field as nested-field

1. Declare field user_info as of type nested in your mappings:

PUT processes
{
  "mappings": {
    "properties": {
      "process_id": {
        "type": "keyword"
      },
      "user_info": {
        "type": "nested",
        "properties": {
          "first_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "last_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

Note: the mapping for user_info with the extra line "type": "nested"

2. Index the 2 sample documents you provided

POST processes/_bulk
{"index":{"_id":1}}
{"process_id": "123", "user_info": [{"first_name": "A", "last_name": "B"}]}
{"index":{"_id":2}}
{"process_id": "123", "user_info": [{"first_name": "C", "last_name": "B"},{"first_name": "A", "last_name": "D"}]}

3. Query for a combination of first_name and last_name using a nested-query

GET processes/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "process_id": {
              "value": "123"
            }
          }
        },
        {
          "nested": {
            "path": "user_info",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "user_info.first_name": "A"
                    }
                  },
                  {
                    "match": {
                      "user_info.last_name": "B"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

Note: the search_request searches for all documents that match the process_id (123) and the combination of user_info.first_name and user_info.last_name for one particular user (and not across users). In the above setup the query only matches document 1. If you want Elasticsearch to also tell you which user_info caused the match (in case a process has multiple user_info objects), you can add the following clause to your nested-query: "inner_hits": {}.

You may have wondered why I've mapped field process_id to a keyword-field. This is best practice as keyword is the most efficient type to store IDs.

Reference in the Elasticsearch documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html

Update Oct 13, 2019: Added version using queryString-query

GET processes/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "process_id:123"
            }
        },
        {
          "nested": {
            "path": "user_info",
            "query": {
              "query_string": {
                "query": "user_info.first_name:A AND user_info.last_name:B"
              }
            }
          }
        }
      ]
    }
  }
}