I downloaded file and file is OK in excel, but in notepad it's format is wrong
I spent a lot of time, but can't solve error
Link to file https://drive.google.com/open?id=1TDh81zdOggOexdaTxeiGz7r7jSkVqLEG
My code:
#to support encodings
# -*- coding: utf-8 -*-
import codecs
path = "badcode.xlsx"
#read input file
with codecs.open(path, 'r', encoding = 'cp1251') as file:
lines = file.read()
#write output file
with codecs.open(path, 'w', encoding = 'utf8') as file:
file.write(lines)
I have error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 668: character maps to
What I did:
path = "badcode.xlsx"
with open(path) as f:
print(f)
Returns
<_io.TextIOWrapper name='badcode.xlsx' mode='r' encoding='cp1251'>
98
is not valid for cp1251. If you have no idea whether this is an English (or a nearby European language), or possibly Greek, Cyrillic, Hebrew, or other encoding, you can always trylatin-1
. That's a unique encoding in that it maps straight to Unicode codes 0000 to 00FF. – Jongwarelatin-1
encoding indeed works on your file:with codecs.open(path, 'r', encoding = 'latin-1') as file: lines = file.read() print (lines)
does not yield DecodeError anymore. That is because – as I said above – all bytes map directly to valid Unicode characters. But it's of no use for you because this is for text files, not binaries. – Jongware