3
votes

I tried a lot of ways to convert the string like b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a' into Chinese characters but all failed.

It's really strange that when I just use

print(b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a')

It will show decoded Chinese Characters.

But if I got the string by reading from my CSV file, it won't do. No matter how I decode the string, it will only show me b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'

Here is my script:

import csv 

with open('need_convert.csv','r+') as csvfile:
    reader=csv.reader(csvfile)
    for row in reader:

        new_row=''.join(row)
        print('new_row:')
        print(type(new_row))
        print(new_row)

        print('convert:')
        print(new_row.decode('utf-8'))

Here is my data (csv file): b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a' b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf' b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'

1
Do not post code/data as images. Post as text - user3483203
have you tried: print(str(your_encoding)) - Fallenreaper
Welcome to Stack Overflow! Please edit your question to include the Python-code as text and include a also some more examples of coded characters in text-form. Thanks! - David
You need to read with the correct encoding. - erip
Hi Fallenreaper, Yes, I've tried you method, not working. Sorry. - Emiya

1 Answers

1
votes

row contents and new_row are both strings, not byte types. Below, I'm using exec('s=' + row[0]) to interpret them as desired, assuming the input is safe.

import csv

with open('need_convert.csv','r+') as csvfile:
    reader=csv.reader(csvfile)
    for row in reader:
        print(type(row[0]), row[0])
        exec('s=' + row[0])
        print(type(s), s)
        print(s.decode('utf-8'))

Output:

<class 'str'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
<class 'bytes'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
国际友谊
<class 'str'> b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf'
<class 'bytes'> b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf'
麒麟杯
<class 'str'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
<class 'bytes'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
国际友谊