57
votes

After I installed BeautifulSoup, Whenever I run my Python in cmd, this warning comes out.

D:\Application\python\lib\site-packages\beautifulsoup4-4.4.1-py3.4.egg\bs4\__init__.py:166:
UserWarning: No parser was explicitly specified, so I'm using the best
available HTML parser for this system ("html.parser"). This usually isn't a
problem, but if you run this code on another system, or in a different
virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "html.parser")

I have no ideal why it comes out and how to solve it.

4
The message is telling you exactly what to do: BeautifulSoup([your markup], "html.parser"). Did you do that and see what your output is? BeautifulSoup is trying to make your life easier. Listen to the Soup. :) - idjaw
Change your code such like soup = BeautifulSoup(html) to soup = BeautifulSoup(html, "html.parser"). - Casimir Crystal

4 Answers

108
votes

The solution to your problem is clearly stated in the error message. Code like the below does not specify an XML/HTML/etc. parser.

BeautifulSoup( ... )

In order to fix the error, you'll need to specify which parser you'd like to use, like so:

BeautifulSoup( ..., "html.parser" )

You can also install a 3rd party parser if you'd like.

20
votes

Documentation recommends that you install and use lxml for speed.

BeautifulSoup(html, "lxml")

If you’re using a version of Python 2 earlier than 2.7.3, or a version of Python 3 earlier than 3.2.2, it’s essential that you install lxml or html5lib–Python’s built-in HTML parser is just not very good in older versions.

Installing LXML parser

  • On Ubuntu (debian)

    apt-get install python-lxml 
    
  • Fedora (RHEL based)

    dnf install python-lxml
    
  • Using PIP

    pip install lxml
    
4
votes

For HTML parser, you need to install html5lib, run:

pip install html5lib

then add html5lib in the BeautifulSoup method:

htmlDoc = bs4.BeautifulSoup(req1.text, 'html5lib')
print(htmlDoc)
2
votes

In my opinion, the previous posts did not answer the question.

Yes, as everyone said, you can remove the warning by specifying the parser.
And as pointed by the documentation, it is a best-practice for performances 1 and for consistency 2.

But in some cases, you want to silence the warning... Hence this post.

  • since BeautifulSoup 4 rev 460, the warning message does not appear in interactive (REPL) mode
  • there are more generalist answers at: How to disable python warnings to control Python warnings (TL;DL: PYTHONWARNINGS=ignore or -Wignore)
  • suppressing the warning explicitly (bs4 ≥ rev 569) by adding to your code:
    import warnings
    warnings.filterwarnings('ignore', category=GuessedAtParserWarning)
    
  • cheating by letting bs4 think you provided the parser, i.e.:
    bs4.BeautifulSoup(
      your_markup,
      builder=bs4.builder_registry.lookup(*bs4.BeautifulSoup.DEFAULT_BUILDER_FEATURES)
    )