Why are there different HTML URL encodings depending on the charset used?

Question

I'm reading the W3Schools' HTML Tutorial, in particular the HTML URL Encoding section.
Here it says that:

URLs can only be sent over the Internet using the ASCII character-set. If a URL contains characters outside the ASCII set, the URL has to be converted.

And:

Your browser will encode input, according to the character-set used in your page.

For example (about this last point) the character € is encoded in %80 for Windows-1252 and %E2%82%AC for UTF-8.

My question is: if there can be used only ASCII characters, why are there two ways of converting the same character depending on the charset used? Couldn't there be just one? What's the gain in this way? Following this, why should I use the accept-charset attribute?

tjago tjago · Accepted Answer · 2016-03-13T22:19:10

They are just different standards.

Microsoft Windows-1252 and other Windows-{$ver} were created in the early ages of computers. Windows-1252 characters are a byte long meaning it supports maximum 255 different signs

It was solution for byte long characters Some old websites still are using these charsets.

On the contrary UTF-8 is stored with up to 4 bytes. Which is more than enough.

UTF-8 Therefore is de facto current standard, capable of storing all Unicode characters. It replaces all windows encoding dialects (like Windows-1252, Windows-1250, Windows-1251 etc.)

It's strongly advised to encode all files for web in UTF-8.

Why are there different HTML URL encodings depending on the charset used?

1 Answers