I am a newbie with Python and trying to retrieve data within this Site using Python version 3.6.0
There are 2 dropdowns and second's data depends on the first's selection.
First: 'Organizasyon Adi' Second: 'UEVCB Adi'
All options from the source is like:
<option value="0" selected="selected">TÜMÜ</option> #this is default value when we open the page
<option value="10374">1461 TRABZON ELEKTRİK ÜRETİM A.Ş</option>
<option value="9426">2M ELEKTRİK ÜRETİM SANAYİ VE TİCARET ANONİM ŞİRKETİ</option>
These are options for firs Dropdown and there are almost 800 options.
We cant see the second Dropdowns options without inspecting the page unless the second Dropdown box is clicked. (Both dropdowns opens a searchbox when clicked.)
Second Dropdown opens a list of units for selected organisation.
When options from two Dropdowns are selected it generates a table data and we're trying to get data for all units.
I couldn't make it to scrap data for all units with one program, so i decided to scrap them individually.
With this code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.action_chains import ActionChains
import time
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://seffaflik.epias.com.tr/transparency/uretim/planlama/kgup.xhtml' #
driver = webdriver.Chrome()
driver.get(url)
time.sleep(3)
organisation = driver.find_element_by_xpath(".//*[@id='j_idt102:distributionId_label']")
organisation.click()
dropdown1 = driver.find_element_by_xpath(".//*[@id='j_idt102:distributionId_filter']")
dropdown1.send_keys('1461')
dropdown1.send_keys(u'\ue007')
unit = driver.find_element_by_id('j_idt102:uevcb_label')
dropdown2 = driver.find_element_by_xpath(".//*[@id='j_idt102:uevcb_filter']")
dropdown2.send_keys('SAMA')
dropdown2.send_keys(u'\ue007')
apply= driver.find_element_by_xpath("//*[@id='j_idt102:goster']")
apply.click()
time.sleep(5)
soup = BeautifulSoup(driver.page_source)
table = soup.find_all('table')[0]
rows = table.find_all('tr')[1:]
data = {
'01.Date' : [],
'02.Hour' : [],
'03.NaturalGas' : [],
'04.Wind' : [],
'05.Lignite' : [],
'06.Hard_Coal' : [],
'07.ImportedCoal' : [],
'08.Geothermal' : [],
'09.Hydro_Dam' : [],
'10.Naphta' : [],
'11.Biomass' : [],
'12.River' : [],
'13.Other' : []
}
for row in rows:
cols = row.find_all('td')
data['01.Date'].append( cols[0].get_text() )
data['02.Hour'].append( cols[1].get_text() )
data['03.NaturalGas'].append( cols[3].get_text() )
data['04.Wind'].append( cols[4].get_text() )
data['05.Lignite'].append( cols[5].get_text() )
data['06.Hard_Coal'].append( cols[6].get_text() )
data['07.ImportedCoal'].append( cols[7].get_text() )
data['08.Geothermal'].append( cols[8].get_text() )
data['09.Hydro_Dam'].append( cols[9].get_text() )
data['10.Naphta'].append( cols[10].get_text() )
data['11.Biomass'].append( cols[11].get_text() )
data['12.River'].append( cols[12].get_text() )
data['13.Other'].append( cols[13].get_text() )
df = pd.DataFrame( data )
writer = pd.ExcelWriter('//192.168.0.102/Data/kgup.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
time.sleep(5)
driver.close()
By this code we can select from first dropdown using search function and Enter key.
When it comes to second, it generates ImportError: sys.meta_path is None, Python is likely shutting down
How should I handle this?
Thanks.