1
votes

I have data on ElasticSearch index that looks like this

 {
     "title": "cubilia",
      "people": [
          "Ling Deponte",
          "Dana Madin",
          "Shameka Woodard",
          "Bennie Craddock",
           "Sandie Bakker"
      ]
  }

Is there a way for me to do a search for all the people whos name starts with "ling" (should be case insensitive) and get distinct terms properly cased "Ling Deponte" not "ling deponte"? I am find with changing mappings on the index in any way.

Edit does what I want but is really bad query:

{
  "size": 0,
  "aggs": {
    "person": {
      "filter": {
        "bool":{
          "should":[
              {"regexp":{
                  "people.raw":"(.* )?[lL][iI][nN][gG].*"
              }}
             ]}
      },
      "aggs": {
        "top-colors": {
          "terms": {
              "size":10,
            "field": "people.raw",
            "include":
            {
              "pattern": ["(.* )?[lL][iI][nN][gG].*"]
            }
          }
        }
      }
    }
  }
}

people.raw is not_analyzed

2

2 Answers

2
votes

Yes, and you can do it without a regular expression by taking advantage of Elasticsearch's full text capabilities.

GET /test/_search
{
  "query": {
    "match_phrase": {
      "people": "Ling"
    }
  }
}

Note: This could also be match or match_phrase_prefix in this case. The match_phrase* queries imply an order of the values in the text. match simply looks for any of the values. Since you only have one value, it's pretty much irrelevant.

The problem is that you cannot limit the document responses to just that name because the search API returns documents. With that said, you can use nested documents and get the desired behavior via inner_hits.

You do not want to do wildcard prefixing whenever possible because it simply does not work at scale. To put it in SQL terms, that's like doing a full table scan; you effectively lose the benefit of the inverted index because it has to walk it entirely to find the actual start.

Combining the two should work pretty well though. Here, I use the query to widdle down results to what you are interested in, then I use your inner aggregation to only include based on the value.

{
  "size": 0,
  "query": {
    "match_phrase": {
      "people": "Ling"
    }
  }
  "aggs": {
    "person": {
      "terms": {
        "size":10,
        "field": "people.raw",
        "include": {
          "pattern": ["(.* )?[lL][iI][nN][gG].*"]
        }
      }
    }
  }
}
0
votes

Hi Please find the query it may help for your request

GET skills/skill/_search
{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "must": [
            {
              "wildcard": {
                "skillNames.raw": "jav*"
              }
            }
          ]
        }
      }
    }
  }
}

My intention is to find documents starting with the "jav"