Test Case
I have a live test case available here: https://lonelearner.github.io/charset-issue/index.html
Since the HTML has non-ASCII characters, if you want to reliably reproduce this test case on your system, here is how you can reproduce it. You can use any one of these methods to reproduce it:
Fetch the page from above URL.
curl https://lonelearner.github.io/charset-issue/index.html -O
Run this command:
echo " 3c21444f43545950452068746d6c3e0a3c68746d6c3e0a20203c68656164 3e0a202020203c7469746c653e636861727365742069737375653c2f7469 746c653e0a202020203c6d65746120687474702d65717569763d22436f6e 74656e742d547970652220636f6e74656e743d22746578742f68746d6c3b 20636861727365743d69736f2d383835392d31223e0a20203c2f68656164 3e0a20203c626f64793e0a202020203c703ea93c2f703e0a20203c2f626f 64793e0a3c2f68746d6c3e0a " | xxd -p -r > index.html
Interesting Byte
Let us look at the ISO-8859-1 encoded character that we are concerned about in this question.
$ curl -s https://lonelearner.github.io/charset-issue/index.html | xxd -g1
00000000: 3c 21 44 4f 43 54 59 50 45 20 68 74 6d 6c 3e 0a <!DOCTYPE html>.
00000010: 3c 68 74 6d 6c 3e 0a 20 20 3c 68 65 61 64 3e 0a <html>. <head>.
00000020: 20 20 20 20 3c 74 69 74 6c 65 3e 63 68 61 72 73 <title>chars
00000030: 65 74 20 69 73 73 75 65 3c 2f 74 69 74 6c 65 3e et issue</title>
00000040: 0a 20 20 20 20 3c 6d 65 74 61 20 68 74 74 70 2d . <meta http-
00000050: 65 71 75 69 76 3d 22 43 6f 6e 74 65 6e 74 2d 54 equiv="Content-T
00000060: 79 70 65 22 20 63 6f 6e 74 65 6e 74 3d 22 74 65 ype" content="te
00000070: 78 74 2f 68 74 6d 6c 3b 20 63 68 61 72 73 65 74 xt/html; charset
00000080: 3d 69 73 6f 2d 38 38 35 39 2d 31 22 3e 0a 20 20 =iso-8859-1">.
00000090: 3c 2f 68 65 61 64 3e 0a 20 20 3c 62 6f 64 79 3e </head>. <body>
000000a0: 0a 20 20 20 20 3c 70 3e a9 3c 2f 70 3e 0a 20 20 . <p>.</p>.
000000b0: 3c 2f 62 6f 64 79 3e 0a 3c 2f 68 74 6d 6c 3e 0a </body>.</html>.
In the row before the last one (line at offset 000000a0
), the 9th byte is a9
. That is our interesting byte. That is an ISO-8859-1 representation of the copyright sign. Note that this is ISO-8859-1 encoded symbol, not UTF-8. If it had been UTF-8 encoded, the bytes would be c2 a9
.
META Tag
To ensure that the content of this HTML file is interpreted as ISO-8859-1 encoded data, there is this <meta>
tag in the HTML code:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
Local Behavior
If you open this file on your system locally with a browser, you would most likely see an output like this:
This is expected because when opening the file locally, there is no HTTP server sending HTTP headers. So the iso-8859-1
encoding specified in the <meta>
tag is honored.
GitHub Behaviour
If you access the URL https://lonelearner.github.io/charset-issue/index.html with a browser, you would most likely see an output like this:
This is also expected. If you notice the page is served with GitHub Pages and GitHub Pages server always returns HTTP header that specifies ISO-8859-1 encoding.
$ curl -sI https://lonelearner.github.io/charset-issue/index.html | grep -i content-type
content-type: text/html; charset=utf-8
Since HTTP header specifies the character encoding, the character encoding in <meta>
tag is no longer honored.
Question
Is there anyway I can override the character encoding specified in the HTTP header using HTML, JavaScript or CSS to tell the browser that this content should be interpreted as ISO-8859-1 encoding even if the HTTP header says otherwise?
I know I can always write the copyright symbol as ©
or encode the symbol in UTF-8 in the file, but let us consider such solutions to be outside the scope of this question because here are the constraints I am dealing with:
- The content of the
<body>
is made available to me as ISO-8859-1 encoded text. - I cannot modify the content of the
<body>
. I must use the ISO-8859-1 encoded text in my HTML. - I can modify anything within the
<head>
tag. So I can add JavaScript, CSS or any other tricks that can solve this problem.