I'm trying to figure out how non-ascii characters get saved in ascii files. For example, if I open notepad ++ and set encoding to UTF-8 and then write שלום it will save it as 11 bites. 3 for BOM mark and two for each character. (I added | before and after each byte)
|239||187||191||215||169||215||156||215||149||215||157|
I can look up these values and figure out what letter they are referring to. E.g. http://utf8-chartable.de/unicode-utf8-table.pl?start=1408&number=128&utf8=dec
if I open a new file and set encoding to ASCII and write the same word. It will save 4 bites:
|249||236||229||237|
if I open the ASCII file it will correctly show me the hebrew word that I typed. How does it know? Is there a similar reference as the one for unicode?