0
votes
from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome(executable_path = 
r'C:\chromedriver_win32\chromedriver.exe')

driver.get('https://www.imdb.com/')

html_doc = driver.page_source

soup = BeautifulSoup(html_doc, 'lxml')
print(soup.prettify())

driver.quit()

i tried this code and it gives this error.

Traceback (most recent call last): File "E:\Practice\WebScraping\webscrape.py", line 11, in print(soup.prettify()) File "C:\Users\vmbck\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u25ec' in position 241524: character maps to

then i tried with encode("utf-8")

html_doc = driver.page_source.encode("utf-8")

again it gives that error

how can i get page_source without getting UnicodeEncodeError

2
thank you very much.... i fixed that with html_doc = ascii(driver.page_source)Buddhika Chathuranga

2 Answers

1
votes
import requests
from bs4 import BeautifulSoup
a = requests.get('https://www.imdb.com/')
soup = BeautifulSoup(a.content, 'lxml')
print(soup.prettify())

The above code does similar to what you have written. But, to solve the unicode error, you can try doing what was suggested in the following post Python Unicode Encode Error

-1
votes

if encoding to utf-8 is failing try to encode to ascii

try both : -

print(soup.encode('utf-8').prettify())

and

print(soup.encode('ascii').prettify())