Scraping from dropdown menus with Python

Question

I am a newbie with Python and trying to retrieve data within this Site using Python version 3.6.0

There are 2 dropdowns and second's data depends on the first's selection.

First: 'Organizasyon Adi' Second: 'UEVCB Adi'

All options from the source is like:

<option value="0" selected="selected">TÜMÜ</option> #this is default value when we open the page
<option value="10374">1461 TRABZON ELEKTRİK ÜRETİM A.Ş</option>
<option value="9426">2M ELEKTRİK ÜRETİM SANAYİ VE TİCARET ANONİM ŞİRKETİ</option>

These are options for firs Dropdown and there are almost 800 options.

We cant see the second Dropdowns options without inspecting the page unless the second Dropdown box is clicked. (Both dropdowns opens a searchbox when clicked.)

Second Dropdown opens a list of units for selected organisation.

When options from two Dropdowns are selected it generates a table data and we're trying to get data for all units.

I couldn't make it to scrap data for all units with one program, so i decided to scrap them individually.

With this code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.action_chains import ActionChains
import time
from bs4 import BeautifulSoup
import pandas as pd 

url = 'https://seffaflik.epias.com.tr/transparency/uretim/planlama/kgup.xhtml' #
driver = webdriver.Chrome()
driver.get(url)
time.sleep(3)
organisation = driver.find_element_by_xpath(".//*[@id='j_idt102:distributionId_label']")
organisation.click()
dropdown1 =  driver.find_element_by_xpath(".//*[@id='j_idt102:distributionId_filter']")
dropdown1.send_keys('1461')
dropdown1.send_keys(u'\ue007')
unit = driver.find_element_by_id('j_idt102:uevcb_label')
dropdown2 = driver.find_element_by_xpath(".//*[@id='j_idt102:uevcb_filter']")
dropdown2.send_keys('SAMA')
dropdown2.send_keys(u'\ue007')
apply= driver.find_element_by_xpath("//*[@id='j_idt102:goster']")
apply.click()
time.sleep(5)

soup = BeautifulSoup(driver.page_source)

table = soup.find_all('table')[0]
rows = table.find_all('tr')[1:]

data = {
    '01.Date' : [],
    '02.Hour' : [],
    '03.NaturalGas' : [],
    '04.Wind' : [],
    '05.Lignite' : [],
    '06.Hard_Coal' : [],
    '07.ImportedCoal' : [],
    '08.Geothermal' : [],
    '09.Hydro_Dam' : [],
    '10.Naphta' : [],
    '11.Biomass' : [],
    '12.River' : [],
    '13.Other' : []
}

for row in rows:
    cols = row.find_all('td')
    data['01.Date'].append( cols[0].get_text() )
    data['02.Hour'].append( cols[1].get_text() )
    data['03.NaturalGas'].append( cols[3].get_text() )
    data['04.Wind'].append( cols[4].get_text() )
    data['05.Lignite'].append( cols[5].get_text() )
    data['06.Hard_Coal'].append( cols[6].get_text() )
    data['07.ImportedCoal'].append( cols[7].get_text() )
    data['08.Geothermal'].append( cols[8].get_text() )
    data['09.Hydro_Dam'].append( cols[9].get_text() )
    data['10.Naphta'].append( cols[10].get_text() )
    data['11.Biomass'].append( cols[11].get_text() )
    data['12.River'].append( cols[12].get_text() )
    data['13.Other'].append( cols[13].get_text() )

df = pd.DataFrame( data )
writer = pd.ExcelWriter('//192.168.0.102/Data/kgup.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
time.sleep(5)
driver.close()

By this code we can select from first dropdown using search function and Enter key.

When it comes to second, it generates ImportError: sys.meta_path is None, Python is likely shutting down

How should I handle this?

Thanks.

Andersson Andersson · Accepted Answer · 2017-01-31T11:19:22

Your code seem to be sensitive to StaleElementException as well as to exception Element is not clickable at point.... Try below code for web-scraping part and let me know the result:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
import pandas as pd 

url = 'https://seffaflik.epias.com.tr/transparency/uretim/planlama/kgup.xhtml' #
driver = webdriver.Chrome()
driver.get(url)
wait = WebDriverWait(driver, 20)
driver.maximize_window()

wait.until_not(EC.visibility_of_element_located((By.ID,'j_idt15'))) # wait until modal disappeared
wait.until(EC.element_to_be_clickable((By.ID,'j_idt102:distributionId_label'))).click() # organization drop-down
wait.until(EC.element_to_be_clickable((By.ID, 'j_idt102:distributionId_filter'))).send_keys('1461' + u'\ue007') # select required
wait.until_not(EC.visibility_of_element_located((By.ID,'j_idt179_modal'))) # wait until modal disappeared
wait.until(EC.element_to_be_clickable((By.ID,'j_idt102:uevcb_label'))).click() # unit drop-down
wait.until(EC.element_to_be_clickable((By.ID, 'j_idt102:uevcb_filter'))).send_keys('SAMA' + u'\ue007') # select unit
wait.until(EC.element_to_be_clickable((By.ID,'j_idt102:goster'))).click() # click Apply
wait.until_not(EC.visibility_of_element_located((By.ID,'j_idt15'))) # wait until modal disappeared

soup = BeautifulSoup(driver.page_source)
....

Scraping from dropdown menus with Python

1 Answers