0
votes

I'm trying to scrape the phone number from a page. One such page is this. All the pages contain a link button with text SEE PHONE NUMBER, clicking on which reveals the phone number. I'm trying to scrape that particular phone number. Here is what I've tried so far :

company_url = 'https://www.europages.co.uk/PORT-INTERNATIONAL-GMBH/00000004710372-508993001.html'
d = {}
try :
    options = webdriver.FirefoxOptions()
    options.add_argument('--ignore-certificate-errors')
    options.add_argument('--incognito')
    options.add_argument('--headless')
    driver = webdriver.Firefox(options=options)
    driver.get(company_url)
    link = driver.find_element_by_link_text('See phone number')
    link.click()
    driver.close()
    page = driver.page_source
    soup = bs(page, 'html.parser')
    tel_no = soup.find('div', {'class' : 'info-tel-num'})
    tel_no = tel_no.text
    d['telephone'] = tel_no
except Exception as e:
    print(f'Error encountered : {e}')

But every time, it prints this error in the exception block :

Error encountered : Message: Unable to locate element: See phone number

This link button doesn't have any particular id or class, so I can't use find_element_by_id or find_element_by_class. Here is what I found by inspect element on that button (before clicking):

inspect element result

And here is the inspect element result after clicking the button :

after clicking How to scrape this phone number? What am I doing wrong?

3
Did you try add some wait? - Guy
How will it help after link.click() if the failure is in the line before? it's probably page loading timing issue, you need to wait before trying to locate the element. - Guy

3 Answers

3
votes

The desired element is a JavaScript enabled element so to locate and click() on the element you have to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following solutions:

  • Using CSS_SELECTOR:

    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a[onclick^='EpGetInfoTel']"))).click()
    
  • Using XPATH:

    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[starts-with(@onclick, 'EpGetInfoTel') and text()='See phone number']"))).click()
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • To scrape the phone number you can use the following line of code:

    print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//a[starts-with(@onclick, 'EpGetInfoTel') and text()='See phone number']//following::div[1]"))).get_attribute("innerHTML"))
    
  • Console Output:

    +49 04 03 01 00 00
    
  • Browser Snapshot:

phone

0
votes

Use this to click on see phone number

$("[itemprop='telephone'] a")[0].click();

and to get phone number value use this:

$("[itemprop='telephone'] [style='display: block;']")[0].innerText
0
votes

Use WebDriverWait and click on the element with following xpath.Then get the page_source if you want to use BeautifulSoup as you are doing it.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as bs
company_url = 'https://www.europages.co.uk/PORT-INTERNATIONAL-GMBH/00000004710372-508993001.html'
d = {}
try :
    options = webdriver.FirefoxOptions()
    options.add_argument('--ignore-certificate-errors')
    options.add_argument('--incognito')
    options.add_argument('--headless')
    driver = webdriver.Firefox(options=options)
    driver.get(company_url)
    link =WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//a[contains(.,"See phone number")]')))
    link.click()
    time.sleep(2)
    page = driver.page_source
    driver.close()
    soup = bs(page, 'html.parser')
    tel_no = soup.find('div', {'class' : 'info-tel-num'})
    tel_no = tel_no.text
    d['telephone'] = tel_no
except Exception as e:
   print(f'Error encountered : {e}')


print(d)

Output on console:

{'telephone': '+49 04 03 01 00 00'}