0
votes

I used the rarfile and unrar in Python to extract some chinese character named files in the rar archives. when I use rarobj.extractall(TargetDir) function , It works. BUT when I use

#encoding:utf-8
...
for fl in rarobj.namelist():
    rarobj.extract(fl,TargetDir)

There is an Error:

rarfile.RarNoFilesError: No files that match pattern were found [10]

and changed in following:

#encoding:gbk
    ...
    for fl in rarobj.namelist():
        rarobj.extract(fl,TargetDir)

There is another Error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

and use

for fl in rarobj.namelist():
    rarobj.extract(fl.encode('gbk'),TargetDir)

Can eliminate this error,however It can't extract the fl file , for the filename is not in the rar archive.

How I can handle this problem?

The rarobj.namelist() is

[u'E:\\2y.pptx', u'E:\\\u6211\u662fabc.pptx', 'E:\\3b.docx', u'E:\\2x.docx', 'E:\\1a.pptx', 'E:\\1b.docx']
1
Looks like whatever RAR library you're using don't handle Unicode properly -- try a different one.martineau
#encoding:utf-8 is not #--coding:utf-8 --;try string encode-decode to/from utf8cox
Thanks , Is there any recommended?yibotg

1 Answers

0
votes

Use open instead of extract

file_like_object = rarobj.open(fl)
data = file_like_object.read()