Replacements for gettext

Question

We use gettext for translations in our product, but have had a lot of problems with it:

Can't use a language unless the system supports it.

On Solaris 9 Sparc, if we reset the environment to various English locales, the message still won't be translated, if the machine doesn't have the corresponding locale. The translation file is present, but we can't access it.

Uses environment to work out language

This causes problems in servers that want to translate messages into different languages. In theory this could be an entirely thread-safe, parallelisable operation - but gettext means we have to have a global lock around translation.

Can't set a default language

By this I don't mean the text in the code. We use MsgIDs in the code, so what I want is to be able to specify a fall-back translation to go to, if the current environment define language is unavailable. But gettext doesn't allow that - I have to try, then reset the environment before it will ordain to look at a different translation. (Using MsgIDs wasn't my choice - I wanted to follow gettext standards and use English as the IDs, but I was overruled, and it would be a lot of work to change it now)

Encoding the are returned vary between UTF-8 and current local encoding.

I don't mean the .po files - they are all in UTF-8 (annoying that msgfmt doesn't handle BOM but whatever). I mean the output of gettext ngettext etc, which are in UTF-8 (regardless of local/terminal encoding) on AIX and HPUX, but local encoding on Solaris/Linux/FreeBSD, although that might be due to iconv issues?

In any case it would be nice not to have to have special code for different platforms - I'll have to investigate if I can get bind_textdomain_codeset(domain,codepage); to help against this problem.

Does anyone know of an open-source translation libraries that provide a more useful interface?

Éric Malenfant Éric Malenfant · Accepted Answer · 2009-10-08T12:51:04

We are using ICU resource bundles and are pretty satisfied with it. The ICU interface is not "modern", but it is powerful, the underlying principles are sound, and resources packaging (with the genrb tool) is pretty flexible. Its message formatting capabilities are also good.

About your specific comments:

Can't use a language unless the system supports it.

I don't understand this one. This may be due to the fact that the only "experience" I have with gettext is having read its documentation.

Uses environment to work out language

The ICU interface takes a Locale as input, so you have complete control. It also has a concept of "default locale" if it is more convenient to you.

Can't set a default language

ICU has an elaborate fallback mechanism, involving a "default" bundle

Encoding the are returned vary between UTF-8 and current local encoding.

String ResourceBundles (other data types are also possible) are always represented as UnicodeString, which is internally encoded in UTF-16. UTF-32 with UnicodeString is pretty easy, as its interface exposes several methods allowing to manipulate it at the codepoint level. For other encodings, code conversion is possible.

Replacements for gettext

5 Answers