0
votes

I would like to print the company name from the Google Finance page, using the div class appbar-snippet-primary. The code I am usng returns none or []. Wasn't able to get to the span tag containing the company name using beautifulsoup.

html = urlopen('https://www.google.com/finance?q=F')
soup = BeautifulSoup(html, "html.parser")
x = soup.find(id='appbar-snippet-primary')
print(x)

Thank you for the explanation. I have updated the code as you suggested and included the stock price, created a loop, then stored the information in a dictionary.

from bs4 import BeautifulSoup
import requests

x = ('F', 'GE', 'GOOGL')
Company = {}

for i in x:
    head = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64)  AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}
    html = requests.get('https://www.google.com/finance?q=%s' % (i) ,   headers=head).content
    soup = BeautifulSoup(html, "html.parser")
    c = soup.find("div", class_="appbar-snippet-primary").text
    p = soup.find('span',class_='pr').span.text
    Company.update({c : p})
for k, v in Company.items():
print('{:<30} {:>8}'.format(k,v))
2

2 Answers

1
votes

It's a class, not an ID

The element you're interested in looks like this

<div class="appbar-snippet-primary">
    <span>Ford Motor Company</span>
</div>

So it's a div with class="appbar-snippet-primary", not id="appbar-snippet-primary" like your code implies.

That value isn't in the raw HTML, it requires JS to execute first

However there is a deeper problem, that div isn't set until the JavaScript on that page runs, so it's not going to be possible to download the raw HTML and run BeautifulSoup on it, because then the JS isn't executed yet.

One of the script tags in that raw HTML contains: var _companyName = 'Ford Motor Company';, so you can grep for that _companyName = if you insist on using the raw HTML.

Use Selenium

You can use Selenium, because it pilots an actual browser and runs the JS, then you can find that element using its class

from __future__ import print_function

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("https://www.google.com/finance?q=F")

div = driver.find_element_by_css_selector('.appbar-snippet-primary')
company_name = div.text
print(company_name)

driver.close()

I get:

Ford Motor Company
0
votes

The value is not dynamically generated by Javascript, it is in the source, all you need to do is add a user-agent and use the correct tag name, the following example using requests gets what you want:

from bs4 import BeautifulSoup

import requests

head = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}
html = requests.get('https://www.google.com/finance?q=F', headers=head).content
soup = BeautifulSoup(html, "html.parser")
x = soup.find("div", class_="appbar-snippet-primary")
print(x)

Which returns:

<div class="appbar-snippet-primary"><span>Ford Motor Company</span></div>

If we run the code using x.text to pull the text you can see the output is correct:

In [14]: from bs4 import BeautifulSoup

In [15]: import requests

In [16]: head = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}

In [17]: html = requests.get('https://www.google.com/finance?q=F', headers=head).content

In [18]: soup = BeautifulSoup(html, "html.parser")

In [19]: x = soup.find("div", class_="appbar-snippet-primary")

In [20]: print(x.text)
Ford Motor Company

Now without a user-agent:

In [21]: from bs4 import BeautifulSoup

In [22]: import requests

In [23]: html = requests.get('https://www.google.com/finance?q=F').content

In [24]: soup = BeautifulSoup(html, "html.parser")

In [25]: x = soup.find("div", class_="appbar-snippet-primary")

In [26]: print(x)
None

And x is None as you don't get the same source returned.