0
votes

I need to model a video catalogue composed of movies, tv shows, episodes, TV channels and live programs information into elasticsearch. Some of these entities are correlated, some not. The attributes of these entities are quite different, even if there are some common ones.

Now since I may need to do query cross-entity, imagine the scenario of a customer searching for something that could be a movie, a tv channel or a live event program, is it better to have 1 single index containing a generic entity marked with a logical type attribute, or is it better to have multiple indexes, 1 for each entity (movie, show episode, channel, program) ? In addition, some of these entities, like movies, can have metadata attributes into multiple languages.

Coming from a relational data model DB, I would create different indexes, one for every entity and have a language variant index for every language. Any suggestion or better approach in order to have great search performance and usability?

1

1 Answers

1
votes

Whether to use several indexes or not very much depends on the application, so I cannot provide a definite answer, rather a few thoughts.

From my experience, indexes are rather a means to help maintenance and operations than for data modeling. It is, for example, much easier to delete an index than delete all documents from one source from a bigger index. Or if you support totally separate search applications which do not query across each others data, different indexes are the way to go.

But when you want to query, as you do, documents across data sources, it makes sense to keep them in one index. If only to have comparable ranking across all items in your index. Make sure to re-use fields across your data that have similar meaning (title, year of production, artists, etc.) For fields unique to a source we usually use prefix-marked field names, e.g. movie_... for movie-only meta data.

As for the the language you need to use language specific fields, like title_en, title_es, title_de. Ideally, at query time, you know your user's language (from the browser, because they selected it explicitly, ...) and then search in the language specific fields where available. Be sure to use the language specific analyzers for these fields, at query and at index time.

I see a search engine a bit as the dual of a database: A database stores data but can also index it. A search engine indexes data but can also store it. A database tends to normalize the schema to remove redundancy, a search engine works best with denormalized data for query performance.