2
votes

In Elasticsearch, how do I search for an arbitrary substring, perhaps including spaces? (Searching for part of a word isn't quite enough; I want to search any substring of an entire field.)

I imagine it has to be in a keyword field, rather than a text field.

Suppose I have only a few thousand documents in my Elasticsearch index, and I try:

  "query": {
         "wildcard" : { "description" : "*plan*" }
  }

That works as expected--I get every item where "plan" is in the description, even ones like "supplantation".

Now, I'd like to do

  "query": {
         "wildcard" : { "description" : "*plan is*" }
  }   

...so that I might match documents with "Kaplan isn't" among many other possibilities.

It seems this isn't possible with wildcard, match prefix, or any other query type I might see. How do I simply search on any substring? (In SQL, I would just do description LIKE '%plan is%')

(I am aware any such query would be slow or perhaps even impossible for large data sets.)

2
You need to tokenize your description, in order to search for separate words. Have a read in their documentation: elastic.co/guide/en/elasticsearch/reference/current/…cheffe
If you really want to search for an arbitraty substring, you need to go for ngrams: elastic.co/guide/en/elasticsearch/guide/current/…cheffe

2 Answers

1
votes

Have you tried the regxp query in elasticsearch? It sure does sound like something you might be interested in.

1
votes

I was hoping there might be something built-in for this Elasticsearch, given that this simple substring search seems like a very basic capability (Thinking about it, it is implemented as strstr() in C, LIKE '%%' in SQL, Ctrl+F in most text editors, String.IndexOf in C#, etc.), but this seems not to be the case. Note that the regexp query doesn't support case insensitivity, so I also needed to pair it with this custom analyzer, so that the index matches all-lowercase. Then I can convert my search string to lowercase as well.

{
  "settings": {
    "analysis": {
      "analyzer": {
        "lowercase_keyword": { 
          "type": "custom",
          "tokenizer": "keyword", 
          "filter": [ "lowercase" ] 
        }
      }
    }
  },
  "mappings": { 
     ...
     "description": {"type": "text", "analyzer": "lowercase_keyword"},
  }
}

Example query:

  "query": {
         "regexp" : { "description" : ".*plan is.*" }
  }

Thanks to Jai Sharma for leading me; I just wanted to provide more detail.