1
votes

We're using Django MarkupField to store Markdown text and it works quite well.

However, when we try to index these fields in Wagtail we get serialization errors from Elasticsearch, like this:

File "/usr/local/lib/python3.5/dist-packages/wagtail/wagtailsearch/management/commands/update_index.py", line 120, in handle
  self.update_backend(backend_name, schema_only=options.get('schema_only', False))
File "/usr/local/lib/python3.5/dist-packages/wagtail/wagtailsearch/management/commands/update_index.py", line 87, in update_backend
  index.add_items(model, chunk)
File "/usr/local/lib/python3.5/dist-packages/wagtail/wagtailsearch/backends/elasticsearch.py", line 579, in add_items
  bulk(self.es, actions)
File "/usr/local/lib/python3.5/dist-packages/elasticsearch/helpers/__init__.py", line 195, in bulk
  for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/local/lib/python3.5/dist-packages/elasticsearch/helpers/__init__.py", line 162, in streaming_bulk
  for bulk_actions in _chunk_actions(actions, chunk_size, max_chunk_bytes, client.transport.serializer):
File "/usr/local/lib/python3.5/dist-packages/elasticsearch/helpers/__init__.py", line 61, in _chunk_actions
  data = serializer.dumps(data)
File "/usr/local/lib/python3.5/dist-packages/elasticsearch/serializer.py", line 50, in dumps
  raise SerializationError(data, e)
elasticsearch.exceptions.SerializationError: ({'_partials': [<markupfield.fields.Markup object at 0x7faa6e238e80>, <markupfield.fields.Markup object at 0x7faa6dbc4da0>], 'pk': '1', 'research_interests': <markupfield.fields.Markup object at 0x7faa6e238e80>, 'bio': <markupfield.fields.Markup object at 0x7faa6dbc4da0>}, TypeError("Unable to serialize <markupfield.fields.Markup object at 0x7faa6e238e80> (type: <class 'markupfield.fields.Markup'>)",))

One workaround is to index callables that return field.raw but then we'd have to write one such callable for each and every Markdown field property we have in our models. I thought we could get around this by extending the field property (i.e., the django-markupfield Markup class that replaces the MarkupField) with a get_searchable_content(value) method but the serialization errors persist.

Does anyone have any tips for indexing custom Django fields in Wagtail + elasticsearch?

2

2 Answers

2
votes

There are several ways to do it. The best would be to create your own field in elasticsearch-dsl, see (0) for example, and use that for (de)serialization. Other option is to create your own JSONSerializer (1) subclass, and pass it in as serializer=MyJSONSerializer() in the Elasticsearch constructor, that can deal with markupfield.fields.Markup objects.

0 - https://github.com/elastic/elasticsearch-dsl-py/blob/master/test_elasticsearch_dsl/test_document.py#L49-L58 1 - https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/serializer.py#L24

1
votes

I was putting the get_searchable_content in the wrong place, I thought it was needed in the Markup class but instead it needs to be placed on the Django model Field class itself. Wagtail will then pull the appropriate value to be indexed in elasticsearch (or any other search backend).

The most straightforward solution was to extend MarkupField with a custom Field class and add a get_searchable_content(self, value) that delegates its implementation to MarkupField.get_prep_value.