5
votes

I am trying to get some features like nGrams and synonyms working but I am not having any luck.

I am following this blog post. I have tried adapting the mappings and queries to my data, and it will only match exact terms. I also tried using the exact data from the article from this gist with the same result.

Here is the mapping:

{
   "mappings": {
      "item": {
         "properties": {
            "productName": {
               "fields": {
                  "partial": {
                     "search_analyzer":"full_name",
                     "index_analyzer":"partial_name",
                     "type":"string"
                  },
                  "partial_back": {
                     "search_analyzer":"full_name",
                     "index_analyzer":"partial_name_back",
                     "type":"string"
                  },
                  "partial_middle": {
                     "search_analyzer":"full_name",
                     "index_analyzer":"partial_middle_name",
                     "type":"string"
                  },
                  "productName": {
                     "type":"string",
                     "analyzer":"full_name"
                  }
               },
               "type":"multi_field"
            },
            "productID": {
               "type":"string",
               "analyzer":"simple"
            },
            "warehouse": {
               "type":"string",
               "analyzer":"simple"
            },
            "vendor": {
               "type":"string",
               "analyzer":"simple"
            },
            "productDescription": {
               "type":"string",
               "analyzer":"full_name"
            },
            "categories": {
               "type":"string",
               "analyzer":"simple"
            },
            "stockLevel": {
               "type":"integer",
               "index":"not_analyzed"
            },
            "cost": {
               "type":"float",
               "index":"not_analyzed"
            }
         }
      },
      "settings": {
         "analysis": {
            "filter": {
               "name_ngrams": {
                  "side":"front",
                  "max_gram":50,
                  "min_gram":2,
                  "type":"edgeNGram"
               },
               "name_ngrams_back": {
                  "side":"back",
                  "max_gram":50,
                  "min_gram":2,
                  "type":"edgeNGram"
               },
               "name_middle_ngrams": {
                  "type":"nGram",
                  "max_gram":50,
                  "min_gram":2
               }
            },
            "analyzer": {
               "full_name": {
                  "filter":[
                     "standard",
                     "lowercase",
                     "asciifolding"
                  ],
                  "type":"custom",
                  "tokenizer":"standard"
               },
               "partial_name": {
                  "filter":[
                     "standard",
                     "lowercase",
                     "asciifolding",
                     "name_ngrams"
                  ],
                  "type":"custom",
                  "tokenizer":"standard"
               },
               "partial_name_back": {
                  "filter":[
                     "standard",
                     "lowercase",
                     "asciifolding",
                     "name_ngrams_back"
                  ],
                  "type":"custom",
                  "tokenizer":"standard"
               },
               "partial_middle_name": {
                  "filter":[
                     "standard",
                     "lowercase",
                     "asciifolding",
                     "name_middle_ngrams"
                  ],
                  "type":"custom",
                  "tokenizer":"standard"
               }
            }
         }
      }
   }
}

And the search query (I removed the filter to try to return more results):

{
   "size":20,
   "from":0,
   "sort":[
      "_score"
   ],
   "query": {
      "bool": {
         "should":[
            {
               "text": {
                  "productName": {
                     "boost":5,
                     "query":"test query",
                     "type":"phrase"
                  }
               }
            },
            {
               "text": {
                  "productName.partial": {
                     "boost":1,
                     "query":"test query"
                  }
               }
            },
            {
               "text": {
                  "productName.partial_middle": {
                     "boost":1,
                     "query":"test query"
                  }
               }
            },
            {
               "text": {
                  "productName.partial_back": {
                     "boost":1,
                     "query":"test query"
                  }
               }
            }
         ]
      }
   }
}

Using the query above from the gist, if I remove the following code from the first bool query

"text":{
    "productName":{
        "boost":5,
        "query":"test query",
        "type":"phrase"
    }
} 

so it will not return direct matches, no matter what my search term, I still return no results.

I assume I am missing something glaringly obvious, and don't really know what other information is relevant, so please take it easy on me.

1
Can you post your mapping please! curl http://domain.com/index/type/_mapping Just to check it's set up correctly.ramseykhalaf
Also text is deprecated, use match!ramseykhalaf
@ramseykhalaf You just solved this, I have pulled the mapping before and saw json and assume it was correct, but actually looking at it I noticed it was all messed up. I will credit you in the answer I write up!Rockstar04

1 Answers

5
votes

Looks like I figured out the answer to my problem, blindly copy and pasting. The blog article I linked to seems to be out of date, and the JSON for the commands no longer works correctly (but didn't throw errors when sending the commands).

Here is the code to create the index I used:

{
   "settings": {
      "analysis": {
         "filter": {
            "name_ngrams": {
               "side":"front",
               "max_gram":50,
               "min_gram":2,
               "type":"edgeNGram"
            },
            "name_ngrams_back": {
               "side":"back",
               "max_gram":50,
               "min_gram":2,
               "type":"edgeNGram"
            },
            "name_middle_ngrams": {
               "type":"nGram",
               "max_gram":50,
               "min_gram":2
            }
         },
         "analyzer": {
            "full_name": {
               "filter":[
                  "standard",
                  "lowercase",
                  "asciifolding"
               ],
               "type":"custom",
               "tokenizer":"standard"
            },
            "partial_name": {
               "filter":[
                  "standard",
                  "lowercase",
                  "asciifolding",
                  "name_ngrams"
               ],
               "type":"custom",
               "tokenizer":"standard"
            },
            "partial_name_back": {
               "filter":[
                  "standard",
                  "lowercase",
                  "asciifolding",
                  "name_ngrams_back"
               ],
               "type":"custom",
               "tokenizer":"standard"
            },
            "partial_middle_name": {
               "filter":[
                  "standard",
                  "lowercase",
                  "asciifolding",
                  "name_middle_ngrams"
               ],
               "type":"custom",
               "tokenizer":"standard"
            }
         }
      }
   },
   "mappings" : {
      "product": {
         "properties": {
            "productName": {
               "fields": {
                  "partial": {
                     "search_analyzer":"full_name",
                     "index_analyzer":"partial_name",
                     "type":"string"
                  },
                  "partial_back": {
                     "search_analyzer":"full_name",
                     "index_analyzer":"partial_name_back",
                     "type":"string"
                  },
                  "partial_middle": {
                     "search_analyzer":"full_name",
                     "index_analyzer":"partial_middle_name",
                     "type":"string"
                  },
                  "productName": {
                     "type":"string",
                     "analyzer":"full_name"
                  }
               },
               "type":"multi_field"
            },
            "productID": {
               "type":"string",
               "analyzer":"simple"
            },
            "warehouse": {
               "type":"string",
               "analyzer":"simple"
            },
            "vendor": {
               "type":"string",
               "analyzer":"simple"
            },
            "productDescription": {
               "type":"string",
               "analyzer":"full_name"
            },
            "categories": {
               "type":"string",
               "analyzer":"simple"
            },
            "stockLevel": {
               "type":"integer",
               "index":"not_analyzed"
            },
            "cost": {
               "type":"float",
               "index":"not_analyzed"
            }
         }
      }
   }
}

Here is the code I used to insert a test record (I used this 3 times with slightly changed data)

{
    "productName": "Thingey",
    "productID": "asdfasef9816",
    "warehouse": "usa",
    "vendor": "Cool Things Inc",
    "productDescription": "This is a cool gizmo",
    "categories": "Cool Things",
    "stockLevel": 6,
    "cost": 15.31
}

And finally the JSON for the search query.

{
   "size":20,
   "from":0,
   "sort":[
      "_score"
   ],
   "query": {
      "bool": {
         "should":[
            {
               "text": {
                  "productName.partial": {
                     "boost":1,
                     "query":"ing"
                  }
               }
            },
            {
               "text": {
                  "productName.partial_middle": {
                     "boost":1,
                     "query":"ing"
                  }
               }
            },
            {
               "text": {
                  "productName.partial_back": {
                     "boost":1,
                     "query":"ing"
                  }
               }
            }
         ]
      }
   }
}

The key changes I had to make would be to move the setting from the mappings PUT to the index creation. I also moved the initial mapping definition here, but it could have been created using the regular /index/item/_mapping PUT.

If any of the ElasticSearch pros want to expand this for future readers of this issue please do.