3
votes

Given these excerpt collections:

Translation Collection

[
    {
      "_id": "id01_name",
      "text": "Item's Name"
    },
    {
      "_id": "id01_desc",
      "text": "Item's lore description"
    },
    {
      "_id": "sk_id",
      "text": "Item's skill description"
    },

  ]

Item Collection

[
    {
      "_id": "id01",
      "name": "id01_name",
      "lore_description": "id01_desc",
      "skill": {
        "description": "sk_id01",
      }
    }
  ]

Question:

Using only mongodb driver (NO Mongo ODM, like mongoose, iridium, etc), what is the best approach for (i18n) Internationalization (with fallback) on MongoDB when translation is in another Collection?


MongoDB's aggregate Approach

Currently, I'm using aggregate with $lookup to do these kind of queries.

db.artifact.aggregate([
  { 
    $lookup: { // get the name's data from the english translation
      from: "text_en", 
      localField: "name",
      foreignField: "_id",
      as: "name"
    },

  },
  {
    $unwind: "$name" //unwind because the lookup made name become an array with _id and text
  },
  {
    $addFields: {
      name: "$name.text" //rewrite name (currently an obj) into the translation string
    }
  },

(...etc)

Problem is, I need these 3 steps on every single key I need to translate. On a big document, looks a bit too much, and every $lookup feels like the response time increases bit by bit.

This also doesn't have fallback cases, in case key is not available in said language, e.g., try $lookup on id01_name in the Spanish collection but the collection doesn't have it, so a fallback would be get from English collection.

Here a working example of the example above: https://mongoplayground.net/p/umuPQYriFRe


Manual aggregation Approach

I also thought in doing in phases, that is,

  • item.find() to get item data;
  • translation_english.find({ $in: [ (...listofkeys) ]}) to get all necessary english keys
  • translation_{otherlang}.find({ $in: [ (...listofkeys) ]}) to get all necessary other language keys
  • manually transform the two translation arrays into two objects (with cursor.forEach()), merge with Object.assign({},eng,otherlang)
  • handpicking each key and assigning to the item object

This method covers the fallback, but it's bloated/very verbose.


On a test with both attempts, lookup took 310ms and the manual took 500ms to complete, that for one lookup. But manual has fallback (aka queries 3 collections; item, lang_en, lang_{foreign}). If language key does not exist in foreign, it'll pick the en (it's assumed eng will never miss a key). While lookup fails to return a document when the lookup on foreign lang fails.

1

1 Answers

1
votes

You should take the manual approach.

Currently you're using two queries to find the translation, but indeed it only should take one query if you merge the translation from multiple languages into a single document:

[
    {
      "_id": "id01_name",
      "en": "Item's Name",
      "fr": "fraf"
    },
    {
      "_id": "id01_desc",
      "en": "Item's lore description"
    },
    {
      "_id": "sk_id",
      "en": "Item's skill description",
      "jp": "jpjpj"
    },

  ]

Some considerations to improve the query speed:

  • Use paging for items list returned. Even 100 items per page is easy to query. Not sure you need to support full collection retrieving but it's rare.
  • Index the translation collection on single en field. You should fallback to en anyway so just need to query where en = key.
  • Project the translation document with your language key, so there are just three fields returned: en, __language__, _id.
  • Let the client do the rendering, they can choose the language field they need. If you really need a server pre-processor then the for loop over a page of 100 items does not hurt anyway.
  • Consider to cache the translation into Redis, it's rarely updated anyway and you can take advantage of RAM speed.