Unicode to ASCII: Standardized transcription?

Question

My other question brought up a related question:

Is there a standard table of Unicode to ASCII transcriptions? Think for instance of German ü mapping to ue.

User bobince mentioned in a comment that other languages use the same character in a different way and I fear they may not only use the same glyph but also the same codepoint. Hence mapping e.g. "ü" to "u" would also be acceptable (mapping by visual similarity). So is mapping ü to "u as done by iconv (see for instance link posted by Juancho).

The methods shown in the link posted by Juancho are technically working solutions. However, is there a formal standard for such a mapping or at least a mapping used as a quasi-standard? Ideally it would also include for instance phonetics-based transcriptions for non-latin characters. I remember that one exists for Japanese kana and greek characters. It shouldn't be a big problem in that regard either.

Jukka K. Korpela Jukka K. Korpela · Accepted Answer · 2013-06-20T14:07:34

There is no formal standard on such mappings. Mappings that deal with Latin letters in general (like ü, é and ß) mapping all to Ascii are not really transcriptions or transliterations but just, well, mappings, which might be called simplifications or Asciifications. They are performed for various purposes, often in an ad hoc way.

Mapping ü to ue is rather common in German and might be called an unofficial or de facto standard for German names when ü cannot be used. But other languages use other rules, and it would be odd to Asciify French or Spanish that way; instead, the diacritic would just be dropped, mapping ü to u.

People may map e.g. ü to u" when they are forced (or they believe they are forced) to use Ascii and yet want to convey the message that the u has a diaeresis on it.

Unicode to ASCII: Standardized transcription?

1 Answers