I am having a utf-8 encoded roff-file that I want to convert to a manpage with
$ nroff -mandoc inittab.5
However, characters in [äöüÄÖÜ]
, e.g. are not displayed properly as it seems that nroff assumes ISO 8859-1 encoding (I am getting [äöüÃÃÃ
] instead. Calling nroff
with the -Tutf8
flag does not change the behaviour and the locale environment variables are (I assume properly) set to
LANG=de_DE.utf8
LC_CTYPE="de_DE.utf8"
LC_NUMERIC="de_DE.utf8"
LC_TIME="de_DE.utf8"
LC_COLLATE="de_DE.utf8"
LC_MONETARY="de_DE.utf8"
LC_MESSAGES="de_DE.utf8"
LC_PAPER="de_DE.utf8"
LC_NAME="de_DE.utf8"
LC_ADDRESS="de_DE.utf8"
LC_TELEPHONE="de_DE.utf8"
LC_MEASUREMENT="de_DE.utf8"
LC_IDENTIFICATION="de_DE.utf8"
LC_ALL=
Since nroff
is only a wrapper-script and eventually calles groff
I checked the call to the latter which is:
$ groff -Tutf8 -mandoc inittab.5
Comparing the byte-encodings of characters in the src file and the output file I am getting the following conversions:
character src file output file
--------- -------- -----------
ä C3 A4 C3 83 C2 A4
ö C3 B6 C3 83 C2 B6
ü C3 BC C3 83 C2 BC
Ä C3 84 C3 83
Ö C3 96 C3 83
Ü C3 9C C3 83
ß C3 9F C3 83
This behaviour seems very weird to me (why am I getting an additional C3 83
and have the original byte-sequence truncated alltogether for big umlauts and ß
?)
Why is this and how can I make nroff
/groff
properly convert my utf-8 encoded file?
EDIT: I am using GNU nroff (groff) version 1.22.2
less inittab.5
do you see proper characters? By the way the question is off topic for this site, you may have better luck at unix/linux stackexchange. – n. 1.8e9-where's-my-share m.