0
votes

I have a git repository, html, js, php files mostly encoded in UTF-8. All of them should be encoded in UTF-8 but executing file -i * shows most of them are UTF-8 but some of them like this:

file.html.twig: text/plain; charset=us-ascii

Extrangely, if I open that files with Sublime text it shows UTF-8. And if I execute this conversion:

iconv -f us-ascii -t iso-8859-1 file.html.twig -o file2.html.twig

or

iconv -f utf-8 -t iso-8859-1 file.html.twig -o file2.html.twig

Nothing changes, while if I execute this command over an utf-8 file conversion is actually done.

Why does this happen? I know us-ascii is a subset of utf-8, but iconv seems unable to change this charset.

(My ultimate goal is to maintain a git repository with files in iso-8859-1, and git seems not to recognize file encoding, this will be a second problem, first I need to resolve this...)

thank you

1
Can you show a hex dump of a problematic character sequence? - choroba
The problem is with the file encoding itself, the content is ok, I need all my source code in ISO-8859-1 (I know it is not a wise solution but this code is co-living with a legacy disaster app and I have no chocice) - K. Weber

1 Answers

0
votes

If they are in UTF-8, then don't let a guesser like file mislead you. It gives one answer when it could give many; whenever it guesses ASCII, it could also include UTF-8, iso-8859-1 and dozens of others.

So, take all your UTF-8 files and convert them to iso-8859-1. But do understand that it could be a lossy conversion, so don't override iconv's default behavior telling you about that with an error.

Oh, but, you are converting files that might have internal declarations of their encodings (e.g., HTML and XML), you should have update their declarations.

And, if there are errors, you can usually change the content to a different representation of the characters not supported by iso-8859-1, according to whatever language (🚲 => \uD83D\uDEB2 or similar) or markup rules (🚲 => 🚲) apply.