0
votes

Im reading one chapter from the W3C HTML Document Representation

In the 5.1 says this:

User agents must also know the specific character encoding that was used to transform the document character stream into a byte stream.

Then in the 5.2 says this:

The "charset" parameter identifies a character encoding, which is a method of converting a sequence of bytes into a sequence of characters.

Char-Bytes

Bytes-Char

So im wrong or there are 2 encodings between the representation...

1

1 Answers

1
votes

A "character encoding" such as UTF-8 is, strictly speaking, a specification for representing characters as a sequence of bytes. But the encodings are always reversible, so we can speak of a (single) character encoding as going both ways.

Other character encodings used in practice are UTF-16 ad UTF-32.

Each of these are specifications under which you can encode text as bytes and decode bytes into characters. Two parts of the same specification.