0
votes

I want to scrape the salary of the job but there are many elements that don't relate to salary have the same tag name and class names how can I scrape it with beautifulsoup4 or I must find another web scraping libraries like selenium. And I think that the xpath will be the same also. How can I scrape the salary only without the another elements about the skills and description

html = '''
<div class="the-same-div">
    <span class="header-span">Salary</span>
    <span class="key-span">
        <span class="css-8888">1000 Dollar</span>
    </span>
</div>
<div class="the-same-div">
    <span class="header-span">Skills</span>
    <span class="key-span">
        <span class="css-8888">Web scraping</span>
    </span>
</div>
<div class="the-same-div">
    <span class="header-span">Description</span>
    <span class="key-span">
        <span class="css-8888">This is a web scraping Job with good salary</span>
    </span>
</div>'''

Now this is the python code to scrape the salary element

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "lxml")

salary = soup.find_all("span", {"class": "css-8888"})

Now how can I scrape the salary of this job. Thank you.

3

3 Answers

0
votes

I am not sure that sellenium is good choise for such task, selenium main purpose is a little bit different. To get all salaries i would do in following way:

from bs4 import BeautifulSoup as bs

html_file = open("test.html", "r")

soup = bs(html_file.read())

same_div_list = soup.find_all("div", {"class": "the-same-div"})
jobs_salary_list = []

for div in same_div_list:
    if div.find("span", {"class": "header-span"}).text == "Salary":
        jobs_salary_list.append(div.find("span", {"class": "css-8888"}).text)
print(jobs_salary_list)

So basically bs4 is giving you ability to search locally (inside other objects), so first of all you get all "the-same-div" divs, iterate over them and look in "header-span" values, if it is equal to "Salary" then you take value of "css-8888" span.

0
votes

Since Selenium is tagged , this is what I would do in Selenium :

//span[text() = 'Salary']/following-sibling::span/span

and get the text out of it using the .text method

something like this :

print(driver.find_element_by_xpath("//span[text() = 'Salary']/following-sibling::span/span").text)

if there's more than one salary use find_elements

0
votes

You can grab the tag that has the "Salary" text and then .find_next() to get the sequential <span> tag with the salary:

html = '''
<div class="the-same-div">
    <span class="header-span">Salary</span>
    <span class="key-span">
        <span class="css-8888">1000 Dollar</span>
    </span>
</div>
<div class="the-same-div">
    <span class="header-span">Skills</span>
    <span class="key-span">
        <span class="css-8888">Web scraping</span>
    </span>
</div>
<div class="the-same-div">
    <span class="header-span">Description</span>
    <span class="key-span">
        <span class="css-8888">This is a web scraping Job with good salary</span>
    </span>
</div>'''



from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "lxml")

span = soup.find_all("span", {"class": "header-span"}, text='Salary')
for each in span:
    salary = each.find_next('span',{'class':'css-8888'})
    print(salary.text)

Output:

1000 Dollar