0
votes

I use dynamic mapping in elasticsearch to load my json file into elasticsearch, like this:

es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

def extract():
    f = open('tmdb.json')
    if f:
        return json.loads(f.read())

movieDict = extract()

def index(movieDict={}):

    for id, body in movieDict.items():
        es.index(index='tmdb', id=id, doc_type='movie', body=body)

index(movieDict)

How can I update mapping for single field? I have field title to which I want to add different analyzer.

title_settings = {"properties" : { "title": {"type" : "text", "analyzer": "english"}}}
es.indices.put_mapping(index='tmdb', body=title_settings)

This fails.

I know that I cannot update already existing index, but what is proper way to reindex mapping generated from json file? My file has a lot of fields, creating mapping/settings manually would be very troublesome.

I am able to specify analyzer for an query, like this:

query = {"query": {
            "multi_match": {
                "query": userSearch, "analyzer":"english", "fields": ['title^10', 'overview']}}} 

How do I specify it for index or field?

I am also able to put analyzer to settings after closing and opening index

analysis = {'settings': {'analysis': {'analyzer': 'english'}}}
es.indices.close(index='tmdb')
es.indices.put_settings(index='tmdb', body=analysis)
es.indices.open(index='tmdb')

Copying exact settings for english analyzers doesn't do 'activate' it for my data.

https://www.elastic.co/guide/en/elasticsearch/reference/7.6/analysis-lang-analyzer.html#english-analyzer

By 'activate' I mean, search is not returned in a form processed by english analyzer ie. there are still stopwords.

1

1 Answers

0
votes

Solved it with massive amount of googling....

  1. You cannot change analyzer on already indexed data. This includes opening/closing of index. You can specify new index, create new mapping and load your data (quickest way)

  2. Specifying analyzer for whole index isn't good solution, as 'english' analyzer is specific to 'text' fields. It's better to specify analyzer by field.

  3. If analyzers are specified by field you also need to specify type.

  4. You need to remember that analyzers are used at can be used at/or index and search time. Reference Specifying analyzers

Code:

def create_index(movieDict={}, mapping={}):
    es.indices.create(index='test_index', body=mapping)

    start  = time.time()
    for id, body in movieDict.items():
        es.index(index='test_index', id=id, doc_type='movie', body=body)
    print("--- %s seconds ---" % (time.time() - start))

Now, I've got mapping from dynamic mapping of my json file. I just saved it back to json file for ease of processing (editing). That's because I have over 40 fields to map, doing it by hand would be just tiresome.

mapping = es.indices.get_mapping(index='tmdb')

This is example of how title key should be specified to use english analyzer

'title': {'type': 'text', 'analyzer': 'english','fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}