1
votes

I want to use elasticsearch for multi-word searches, where all the fields are checked in a document with the assigned analyzers.

So if I have a mapping:

{
"settings": {
    "analysis": {
      "analyzer": {
        "folding": {
          "tokenizer": "standard",
          "filter":  [ "lowercase", "asciifolding" ]
        }
      }
    }
  },
  "mappings" : {
    "typeName" :{
      "date_detection": false,
      "properties" : {
        "stringfield" : {
          "type" : "string",
          "index" : "folding"
        },
        "numberfield" : {
          "type" : "multi_field",
          "fields" : {
            "numberfield" : {"type" : "double"},
            "untouched" : {"type" : "string", "index" : "not_analyzed"}
          }
        },
        "datefield" : {
          "type" : "multi_field",
          "fields" : {
            "datefield" : {"type" : "date", "format": "dd/MM/yyyy||yyyy-MM-dd"},
            "untouched" : {"type" : "string", "index" : "not_analyzed"}
          }
        }
      }
    }
  }
}

As you see I have different types of fields, but I do know the structure. What I want to do is starting a search with a string to check all fields using the analyzers too.

For example if the query string is:

John Smith 2014-10-02 300.00

I want to search for "John", "Smith", "2014-10-02" and "300.00" in all the fields, calculating the relevance score as well. The better solution is the one that have more field matches in a single document.

So far I was able to search in all the fields by using multi_field, but in that case I was not able to parse 300.00, since 300 was stored in the string part of multi_field. If I was searching in "_all" field, then no analyzer was used.

How should I modify my mapping or my queries to be able to do a multi-word search, where dates and numbers are recognized in the multi-word query string? Now when I do a search, error occurs, since the whole string cannot be parsed as a number or a date. And if I use the string representation of the multi_search then 300.00 will not be a result, since the string representation is 300.

(what I would like is similar to google search, where dates, numbers and strings are recognized in a multi-word query)

Any ideas?

Thanks!

1

1 Answers

-1
votes

Using whitespace as filter in analyzer and then applying this analyzer as search_analyzer to fields in mapping will split query in parts and each of them would be applied to index to find the best matching. And using ngram for index_analyzer would very improve results. I am using following setup for query:

"query": {
            "multi_match": {
                "query": "sample query",
                "fuzziness": "AUTO",
                "fields": [
                    "title",
                    "subtitle",
                ]
            }
        }

And for mappings and settings:

{
"settings" : {
    "analysis": {
        "analyzer": {
            "autocomplete": {
                "type": "custom",
                "tokenizer": "whitespace",
                "filter": [
                    "standard",
                    "lowercase",
                    "ngram"
                ]
            }
        },
        "filter": {
            "ngram": {
                "type": "ngram",
                "min_gram": 2,
                "max_gram": 15
            }
        }
    },
"mappings": {
        "title": {
            "type": "string",
            "search_analyzer": "whitespace",
            "index_analyzer": "autocomplete"
        },
        "subtitle": {
            "type": "string"
        }
    }
}

See following answer and article for more details.