1
votes

I'm learning BeautifullSoup with Visual Studio Code and when I run this script:

import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent

ua = UserAgent()
header = {'user-agent':ua.chrome}
google_page = requests.get('https://www.google.com',headers=header)

soup = BeautifulSoup(google_page.content,'lxml') # html.parser

print(soup.prettify())

And I'm getting the following error:

Traceback (most recent call last): File "c:\ ... \intro-to-soup-2.py", line 13, in print(soup.prettify()) File "C:\ ... \Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f440' in position 515: character maps to

If I force the encoding for utf-8 in the soup variable I won't be abble to use prettify as it doesn't work with strings... Also tried using # -- coding: utf-8 -- on the first line of code without sucess.

Here is my tasks.json for this project:

{
// See https://go.microsoft.com/fwlink/?LinkId=733558
// for the documentation about the tasks.json format
"version": "0.1.0",
"command": "python",
"isShellCommand": true,
"args": ["${file}"],
"files.encoding": "utf8",
// Controls after how many characters the editor will wrap to the next line. Setting this to 0 turns on viewport width wrapping (word wrapping). Setting this to -1 forces the editor to never wrap.
"editor.wrappingColumn": 0, // default value is 300
// Controls the font family.
"editor.fontFamily": "Consolas, 'Malgun Gothic', '맑은 고딕','Courier New', monospace",
// Controls the font size.
"editor.fontSize": 15,
"showOutput": "always"
}

The exact same code is running in PyCharm without any problems. Any ideas how I can fix this in Visual Studio Code?

Here's my "pip freeze" result:

astroid==1.5.3
beautifulsoup4==4.5.3
colorama==0.3.9
fake-useragent==0.1.7
html5lib==0.999999999
isort==4.2.15
lazy-object-proxy==1.3.1
lxml==3.7.2
mccabe==0.6.1
pylint==1.7.1
requests==2.12.5
selenium==3.4.3
six==1.10.0
webencodings==0.5
wrapt==1.10.10
xlrd==1.0.0
XlsxWriter==0.9.6

Thank you for your time,

Eunito.

1
Are pycharm and VScode running the same install of python?pvg
How can I see which version I'm using in VSCode? I instaled PyCharm today so I assume it is using the latest thereEunito
@Eunito Run this as a script in VSCode: import sys; print('Python %s on %s' % (sys.version, sys.platform)). Also, pycharm might be using any install you have (if you have muliple installations) so downloading it recently does not assure that it runs the latest python.TrakJohnson
Ran that on pycharm and VSC and the output was Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)] on win32Eunito
edit - just updated python to the latest version and the problem keeps happening:(Eunito

1 Answers

2
votes

The problem here seems to be the encoding the python interpreter believes stdout/stderr support. For some reason (arguably, a bug in VSCode) this is set to some platform-specific value (cp1252 in windows for you, I was able to reproduce the issue on OS X and got ascii) instead of utf-8 which the VSCode output window supports. You can modify your task.json to look something like this to address this - it sets an environment variable forcing the Python interpreter to use utf8 for output.

{
    // See https://go.microsoft.com/fwlink/?LinkId=733558
    // for the documentation about the tasks.json format
    "version": "0.1.0",
    "command": "python3",
    "isShellCommand": true,
    "args": ["${file}"],
    "showOutput": "always",
    "options": {
        "env": {
            "PYTHONIOENCODING":"utf-8"
        }
    }
}

The relevant bit is the "options" dictionary.