0
votes

I am using Wikipedia.py to fetch information from Wikipedia sections. While doing so, I am having problem with encoding in the following Python code:

for section in data.sections:
        info = data.section(section).encode('utf-8')
        info = info.encode('string_escape')
        print info

data variable is the whole Wikipedia page. Each time I run the script I receive the following error:

'ascii' codec can't encode character u'\u2013'

1

1 Answers

0
votes

You must decode the data.section(section) first by the encoding of Wikipedia sections.

Suppose the encoding of Wikipedia sections is gbk, the code snippet is like this:

for section in data.sections:
    # Please check encoding in decode() first.
    info = data.section(section).decode('gbk').encode('utf-8')
    info = info.encode('string_escape')
    print info