This is not something you can do in a foolproof way. One possibility would be to examine every character in the file to ensure that it doesn't contain any characters in the ranges 0x00 - 0x1f
or 0x7f -0x9f
but, as I said, this may be true for any number of files, including at least one other variant of ISO 8859.
Another possibility is to look for specific words in the file in all of the languages supported and see if you can find them.
So, for example, find the equivalent of the English "and", "but", "to", "of" and so on in all the supported languages of ISO 8859-1 and see if they have a large number of occurrences within the file.
I'm not talking about literal translation such as:
English French
------- ------
of de, du
and et
the le, la, les
although that's possible. I'm talking about common words in the target language (for all I know, Icelandic has no word for "and" - you'd probably have to use their word for "fish" [sorry that's a little stereotypical. I didn't mean any offense, just illustrating a point]).
apropos encoding
. It searches the titles and descriptions of all the manpages. When I do this on my machine, I see 3 tools that might help me, judging by their descriptions:chardet
,chardet3
,chardetect3
. Then, by doingman chardet
and reading the manpage tells me thatchardet
is just the utility I need. – John Redus-ascii
, but after add a line of Chinese comment, it becomesutf-8
.file
can tell the encoding by reading the file content & guess. – user218867