I encountered with a problem while working on text files. I found that the character Unicode representation on Python and C# is different.
While opening the file with Python 3.5.2 on specific index the unicode character is:
with open('file.txt', 'r', encoding = 'utf-8') as f:
text = f.read()
text[189]
// Output: u"\U0001F464"
While opening the file with C# on the same index this char is represented by two characters:
string text = File.ReadAllText("file.txt", Encoding.UTF8);
Console.WriteLine(((int)text[189]).ToString("X4"));
// Output: "D83D"
string text = File.ReadAllText("file.txt", Encoding.UTF8);
Console.WriteLine(((int)text[190]).ToString("X4"));
// Output: "DC64"
So on python this char is on index 189 and on c# its on 189 and 190.
Reference to this charecter on fileformat website:
http://www.fileformat.info/info/unicode/char/1F464/index.htm
As you can see there, the representation of this charecter has a different length. On C#/C/C++/Java "\uD83D\uDC64" and on python u"\U0001F464".
The part of the text that is problematic:
???? Sign in
Is there a way to use the same unicode representation in Python 3.5 and C#?
Edit:
Download of the original file in which this error happend: https://ufile.io/pr5v6