can the extract fun rarfile module handle the chinese character?

Question

I used the rarfile and unrar in Python to extract some chinese character named files in the rar archives. when I use rarobj.extractall(TargetDir) function , It works. BUT when I use

#encoding:utf-8
...
for fl in rarobj.namelist():
    rarobj.extract(fl,TargetDir)

There is an Error:

rarfile.RarNoFilesError: No files that match pattern were found [10]

and changed in following:

#encoding:gbk
    ...
    for fl in rarobj.namelist():
        rarobj.extract(fl,TargetDir)

There is another Error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

and use

for fl in rarobj.namelist():
    rarobj.extract(fl.encode('gbk'),TargetDir)

Can eliminate this error,however It can't extract the fl file , for the filename is not in the rar archive.

How I can handle this problem?

The rarobj.namelist() is

[u'E:\\2y.pptx', u'E:\\\u6211\u662fabc.pptx', 'E:\\3b.docx', u'E:\\2x.docx', 'E:\\1a.pptx', 'E:\\1b.docx']

Looks like whatever RAR library you're using don't handle Unicode properly -- try a different one. — martineau
#encoding:utf-8 is not #--coding:utf-8 --;try string encode-decode to/from utf8 — cox

Timothy Mapley Timothy Mapley · Accepted Answer · 2017-06-05T19:38:08

Use open instead of extract

file_like_object = rarobj.open(fl)
data = file_like_object.read()

can the extract fun rarfile module handle the chinese character?

1 Answers