14
votes

I'm developing a multilingual PHP web application, and I've got long(-ish) texts that I need to translate with gettext. These are email templates (usually short, but still several lines) and parts of view templates (longer descriptive blocks of text). These texts would include some simple HTML (things like bold/italic for emphasis, probably a link here or there). The templates are PHP scripts whose output is captured.

The problem is that gettext seems very clumsy for handling longer texts. Longer texts would generally have more changes over time than short texts — I can either change the msgid and make sure to update it in all translations (could be lots of work and very error-prone when the msgid is long), or I can keep the msgid unchanged and modify only the translations (which would leave misleading outdated texts in the templates). Also, I've seen advice against including HTML in gettext strings, but avoiding it would break a single natural piece of text into lots of chunks, which will be an even bigger nightmare to translate and reassemble, and I've also seen advice against unnecessary splitting of gettext strings into separate msgids.

The other approach I see is to ignore gettext altogether for these longer texts, and to separate those blocks in external subtemplates for each locale, and just include the one for the current locale. The disadvantage is that I'm separating the translation effort between gettext .po files and separate templates located in a completely different location.

Since this application will be used as a starting point for other applications in the future, I'm trying to come up with the best approach for the long term. I need some advice for best practices in such scenarios. How have you implemented similar cases? What turned out to work and what turned out a bad idea?

3

3 Answers

11
votes

Here's the workflow I used, on a very heavily-trafficked site that had about several dozen long-ish blocks of styled textual content, translated into six languages:

  1. Pick a text-based markup language (we used Markdown)
  2. For long strings, use fixed message IDs like "About_page_intro_markdown" that:
    • describes the intent of the text
    • makes clear that it will be interpreted in markdown format
  3. Have our app render "*_markdown" strings appropriately, making sure to allow only a few safe HTML tags
  4. Build a tool for translators that:
    • shows them their Markdown rendered in realtime (sort of like the Markdown dingus)
    • makes it easy for them to see the now-authoritative base language translation of the text (since that's no longer in the msgid)
  5. Teach translators how to use the new workflow

Pros of this workflow:

  • Message IDs don't change all the time
  • Because translators are editing in a safe higher-level syntax, hard to mess up HTML
  • Non-technical translators found it very easy to write in Markdown, vs. HTML

Cons of this workflow:

  • Having static unchanging message IDs means changes in the text need to be transmitted out of band (which we'd do anyway, as long text can raise questions about tone or emphasis)

I'm very happy with the way this workflow operated for our website, and would absolutely recommend it, and use it again. It took a couple of days to get started, but it was easy to build, train, and launch.

Hope this helps, and good luck with your project.

5
votes

I just had this particular problem, and I believe I solved it in an elegant way.

The problem: We wanted to use Gettext in PHP, and use primary language strings as keys translations. However, for large blocks of HTML (with h1, h2, p, a, etc...) I'd either have to:

  • Create a translation for each tag with content.

or

  • Put the entire block with tags in one translation.

Neither of those options appealed to me, so this is what I did:

  • Keep simple strings ("OK","Add","Confirm","My Awesome App") as regular Gettext .po entries, with the original text as the key
  • Write content (large text blocks) in markdown, and keep them in files. Example files would be /homepage/content.md (primary / source text), /homepage/content.da-DK.md, /homepage/content.de-DE.md

  • Write a class that fetches the content files (for the current locale) and parses it. I then used it like:

    <?=Template::getContent("homepage/content")?>

However, what about dynamic large text? Simple. Use a templating engine. I decided on Smarty, and used it in my Template class.

I could now use templating logic.. within markdown! How awesome is that?!

Then came the tricky part..

For content to look good, at times you need to structure your HTML differently. Consider a campaign area with 3 "feature boxes" beneath it. The easy solution: Have a file for the campaign area, and one for each of the 3 boxes.

But I could do better than that.

I wrote a quick block parser, so I would write all the content in one file, and then render each block seperately.

Example file:

[block campaign]
Buy this now!
=============

Blaaaah... And a smarty tag: {$cool}
[/block]

[block feature 1]
Feature 1
---------

asdasd you get it..
[/block]

[block feature 2] ...

And this is how I would render them in the markup:

<?php 
// At the top of the document...

// Class handles locale. :)
$template = Template::getContent("homepage/content", [
    "cool" => "Smarty variable! AWESOME!"
]);
?>

...

<title><?=_("My Awesome App")?></title>    

...

<div class="hero">
   <!-- Template data already processed! :) -->
   <?=$template->renderBlock("campaign")?>
</div>
<div class="featurebox">
   <?=$template->renderBlock("feature 1")?>
</div>
<div class="featurebox">
   <?=$template->renderBlock("feature 2")?>
</div>

I'm afraid I can't provide any source code, as this was for a company project, but I hope you get the idea.

3
votes

gettext wasn't really designed for translating large pieces of text.

fwiw I've included basic HTML (strong, a, etc) in gettext strings as I was confident our translators knew what they were doing (mostly right) and that the translations would be well tested.

I've tried the approach of breaking up the text into one string per paragraph. Roughly as it looks odd if there's one paragraph of English in the middle of the text. Where one of those strings have changed this has meant that we have had to wait for translations before releasing a new version, which has slowed us down. On the plus side it's easy for translators to see which part of the text has changed. This approach worked well for the one application I've tried it with.

Splitting some text out into external locations also worked, but it caused management overhead, rather than just a .po file or two, there was a whole bunch of other text that had to be manually compared to the English version and updated accordingly. This is doable if you remember to provide notes to your translators explaining where and what the difference was in the English version.

I'm still not sold on either approach myself.