37
votes

I'm developing multi-language support for our web app. We're using Django's helpers around the gettext library. Everything has been surprisingly easy, except for the question of how to handle sentences that include significant HTML markup. Here's a simple example:

Please <a href="/login/">log in</a> to continue.

Here are the approaches I can think of:

  1. Change the link to include the whole sentence. Regardless of whether the change is a good idea in this case, the problem with this solution is that UI becomes dependent on the needs of i18n when the two are ideally independent.

  2. Mark the whole string above for translation (formatting included). The translation strings would then also include the HTML directly. The problem with this is that changing the HTML formatting requires changing all the translation.

  3. Tightly couple multiple translations, then use string interpolation to combine them. For the example, the phrase "Please %s to continue" and "log in" could be marked separately for translation, then combined. The "log in" is localized, then wrapped in the HREF, then inserted into the translated phrase, which keeps the %s in translation to mark where the link should go. This approach complicates the code and breaks the independence of translation strings.

Are there any other options? How have others solved this problem?

6

6 Answers

19
votes

Solution 2 is what you want. Send them the whole sentence, with the HTML markup embedded.

Reasons:

  1. The predominant translation tool, Trados, can preserve the markup from inadvertent corruption by a translator.
  2. Trados can also auto-translate text that it has seen before, even if the content of the tags have changed (but the number of tags and their position in the sentence are the same). At the very least, the translator will give you a good discount.
  3. Styling is locale-specific. In some cases, bold will be inappropriate in Chinese or Japanese, and italics are less commonly used in East Asian languages, for example. The translator should have the freedom to either keep or remove the styles.
  4. Word order is language-specific. If you were to segment the above sentence into fragments, it might work for English and French, but in Chinese or Japanese the word order would not be correct when you concatenate. For this reason, it is best i18n practice to externalize entire sentences, not sentence fragments.
12
votes

2, with a potential twist.

You certainly could localize the whole string, like:

loginLink=Please <a href="/login">log in</a> to continue

However, depending on your tooling and your localization group, they might prefer for you to do something like:

// tokens in this string add html links
loginLink=Please {0}log in{1} to continue

That would be my preferred method. You could use a different substitution pattern if you have localization tooling that ignores certain characters. E.g.

loginLink=Please %startlink%log in%endlink% to continue

Then perform the substitution in your jsp, servlet, or equivalent for whatever language you're using ...

7
votes

Disclaimer: I am not experienced in internationalization of software myself.

  1. I don't think this would be good in any case - just introduces too much coupling …
  2. As long as you keep formatting sparse in the parts which need to be translated, this could be okay. Giving translators the possibility to give special words importance (by either making them a link or probably using <strong /> emphasis sounds like a good idea. However, those translations with (X)HTML possibly cannot be used anywhere else easily.
  3. This sounds like unnecessary work to me …

If it were me, I think I would go with the second approach, but I would put the URI into a formatting parameter, so that this can be changed without having to change all those translations.

Please <a href="%s">log in</a> to continue.

You should keep in mind that you may need to teach your translators a basic knowledge of (X)HTML if you go with this approach, so that they do not screw up your markup and so that they know what to expect from that text they write. Anyhow, this additional knowledge might lead to a better semantic markup, because, as mentioned above, texts could be translated and annotated with (X)HTML to reflect local writing style.

3
votes

What ever you do keep the whole sentence as one string. You need to understand the whole sentece to translate it correctly.

Not all words should be translated in all languages: e.g. in Norwegian one doesn't use "please" (we can say "vær så snill" literally "be so kind" but when used as a command it sounds too forceful) so the correct norwegian vould be:

  • "Logg inn for å fortsette" lit.: "Log in to continue" or
  • "Fortsett ved å logge inn" lit.: "Continue by to log in" etc.

You must allow completely changing the order, e.g. in a fictional demo language:

  • "Für kontinuer Loggen bitte ins" (if it was real) lit.: "To continue log please in"

Some language may even have one single word for (most of) this sentence too...

I'll recommend solution 1 or possibly "Please %{startlink}log in%{endlink} to continue" this way the translator can make the whole sentence a link if that's more natural, and it can be completely restructured.

1
votes

Interesting question, I'll be having this problem very soon. I think I'll go for 2, without any kind of tricky stuff. HTML markup is simple, urls won't move anytime soon, and if anything is changed a new entry will be created in django.po, so we get a chance to review the translation ( ex: a script should check for empty translations after makemessages ).

So, in template :

{% load i18n %}
{% trans 'hello <a href="/">world</a>' %}

... then, after python manage.py makemessages I get in my django.po

#: templates/out.html:3
msgid "hello <a href=\"/\">world</a>"
msgstr ""

I change it to my needs

#: templates/out.html:3
msgid "hello <a href=\"/\">world</a>"
msgstr "bonjour <a href=\"/\">monde</a>"

... and in the simple yet frequent cases I'll encounter, it won't be worth any further trouble. The other solutions here seems quite smart but I don't think the solution to markup problems is more markup. Plus, I want to avoid too much confusing stuff inside templates.

Your templates should be quite stable after a while, I guess, but I don't know what other trouble you expect. If the content changes over and over, perhaps that content's place is not inside the template but inside a model.

Edit: I just checked it out in the documentation, if you ever need variables inside a translation, there is blocktrans.

0
votes
  1. Makes no sense, how would you translate "log in"?
  2. I don't think many translators have experience with HTML (the regular non-HTML-aware translators would be cheaper)
  3. I would go with option 3, or use "Please %slog in%s to continue" and replace the %s with parts of the link.