I'm currently reading mails from file and process some of the header information. Non-ASCII characters are encoded according to RFC2047 in quoted-printable oder Base64, so the files contain no non-ASCII characters . If the file is encoded in UTF-8, Win-1252 or one of the ISO-8859-* character encodings, I won't run into problems because ASCII is embedded at the same place in all these charsets (so 0x41
is a A
in all of those charsets).
But what if the file is encoded using an encoding that does not embed ASCII in that way? Do encodings like this even exist? And if so, is there even a reliable way of detecting them?