1
votes

Can someone explain to me how to encode the string "®" to base64 (I just picked any non-ASCII character to exemplify)? This char represents the code 174 or 10101110 in binary.

The result is "wq4=" (got this by checking in two different websites that do base64 encoding online).

I understand how the base64 encoding system works. For ASCII characters I can get the correct results but with any non-ASCII char, the result never matches.

I have tried two ways:

  1. Using directly the binary for this char (10101110) and splitting this into 2 chunks of 6 bits I have: 101011 and 100000. To base64 they correspond to "rg"
  2. Converting 10101110 to UTF-8 first so I have 2 bytes: 11010101 and 10110000. Then I mix these two bytes and separate them in chunks of 6 bits: 110101 011011 000000. To base64 they correspond to "1bA".

I have no idea how to proceed with non-ASCII characters. By doing the same calculation with any ASCII char, just works.

Does anyone know what I am doing wrong?

1

1 Answers

2
votes

Base64 encodes binary data as ASCII text. The Unicode character ® can be encoded with any encoding before applying Base64, but UTF-8 is convenient as it can encode any Unicode code point.

The error in the question was in converting to UTF-8. The bits are distributed least significant bit first from right to left, but were distributed most significant bit first from left to right.

  1. Start with the Unicode code point for ®, which is U+00AE.
  2. Convert to binary: 10101110
  3. Code points U+0080 to U+07FF require two-byte UTF-8 encoding: 110xxxxx 10xxxxxx. Distribute the bits as follows: 11000010 10101110
  4. Group data into 3-byte chunks. Only two bytes so will need one byte of padding when done.
  5. Regroup to 6-bit chunks: 110000 101010 111000.
  6. Convert to decimal: 48 42 56
  7. Using the base64 table below + one pad: wq4=
Value Encoding  Value Encoding  Value Encoding  Value Encoding
    0 A            17 R            34 i            51 z
    1 B            18 S            35 j            52 0
    2 C            19 T            36 k            53 1
    3 D            20 U            37 l            54 2
    4 E            21 V            38 m            55 3
    5 F            22 W            39 n            56 4
    6 G            23 X            40 o            57 5
    7 H            24 Y            41 p            58 6
    8 I            25 Z            42 q            59 7
    9 J            26 a            43 r            60 8
   10 K            27 b            44 s            61 9
   11 L            28 c            45 t            62 +
   12 M            29 d            46 u            63 /
   13 N            30 e            47 v
   14 O            31 f            48 w         (pad) =
   15 P            32 g            49 x
   16 Q            33 h            50 y