6
votes

Sample document

{
 "id" : "video1",
  "title" : "Gone with the wind",
  "timedTextLines" : [ 
    {
      "startTime" : "00:00:02",
      "endTime" :  "00:00:05",
      "textLine" : "Frankly my dear I don't give a damn."
    },
   {
      "startTime" : "00:00:07",
      "endTime" :  "00:00:09",
      "textLine" : " my amazing country."
    },
 {
      "startTime" : "00:00:17",
      "endTime" :  "00:00:29",
      "textLine" : " amazing country."
    }
  ]
}

Index Definition

{
  "mappings": {
    "video_type": {
      "properties": {
        "timedTextLines": {
          "type": "nested" 
        }
      }
    }
  }
}

Response without source filtering in inner works fine.

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.91737854,
    "hits": [
      {
        "_index": "video_index",
        "_type": "video_type",
        "_id": "1",
        "_score": 0.91737854,
        "_source": {

        },
        "inner_hits": {
          "timedTextLines": {
            "hits": {
              "total": 1,
              "max_score": 0.6296964,
              "hits": [
                {
                  "_nested": {
                    "field": "timedTextLines",
                    "offset": 0
                  },
                  "_score": 0.6296964,
                  "_source": {
                    "startTime": "00:00:02",
                    "endTime": "00:00:05",
                    "textLine": "Frankly my dear I don't give a damn."
                  },
                  "highlight": {
                    "timedTextLines.textLine": [
                      "Frankly my dear I don't give a <em>damn</em>."
                    ]
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

Response contains all the properties for the nested property. viz startTime, endTime and textLine. How can I return just the endtime and startTime in the response?

Failed query

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "gone"
          }
        },
        {
          "nested": {
            "path": "timedTextLines",
            "query": {
              "match": {
                "timedTextLines.textLine": "damn"
              }
            },
            "inner_hits": {
             "_source":["startTime","endTime"],
              "highlight": {
                "fields": {
                  "timedTextLines.textLine": {

                  }
                }
              }
            }
          }
        }
      ]
    }
  },
  "_source":"false"
}

Error HTTP/1.1 400 Bad Request content-type: application/json; charset=UTF-8 content-length: 265

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"[inner_hits] _source doesn't support values of type: START_ARRAY"}],"type":"illegal_argument_exception","reason":"[inner_hits] _source doesn't support values of type: START_ARRAY"},"status":400}

1

1 Answers

10
votes

The reason is because since ES 5.0 the _source in inner_hits doesn't support the short form anymore, but only the full object form (with includes and excludes) (see this open issue)

Your query can be rewritten like this and it will work:

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "gone"
          }
        },
        {
          "nested": {
            "path": "timedTextLines",
            "query": {
              "match": {
                "timedTextLines.textLine": "damn"
              }
            },
            "inner_hits": {
             "_source": {
                "includes":[
                  "timedTextLines.startTime",
                  "timedTextLines.endTime"
                ]
             },
              "highlight": {
                "fields": {
                  "timedTextLines.textLine": {

                  }
                }
              }
            }
          }
        }
      ]
    }
  },
  "_source":"false"
}