I'm new to this so sorry if I confuse anything. I'm writing a Selenium webscraper with Python to scrape all Headlines and Dates from the NYTimes Article Archives.
There's a 'Show More' button at the bottom of the page that loads 10 more articles every time you click on it. So I essentially want this to click the "Show More" button until there are no more articles to load and then scrape the whole page for the Headlines and the Dates. Here is my try:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import pandas as pd
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options,
executable_path=r"//usr/local/Caskroom/chromedriver/81.0.4044.69/chromedriver")
driver.get("https://www.nytimes.com/search?dropmab=true&endDate=20120103&query=§ions=Business%7Cnyt%3A%2F%2Fsection%2F0415b2b0-513a-5e78-80da-21ab770cb753&sort=best&startDate=20070101")
WebDriverWait(driver, 40).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='css-vsuiox']//button[@data-testid='search-show-more-button']")))
while True:
try:
WebDriverWait(driver, 40).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='css-vsuiox']//button[@data-testid='search-show-more-button']"))).click()
print("MORE button clicked")
except TimeoutException:
break
driver.quit()
headlines_element = browser.find_elements_by_xpath('//h4[@class="css-2fgx4k"]')
headlines = [x.text for x in headlines_element]
print('headlines:')
print(headlines, '\n')
dates_element = browser.find_elements_by_xpath("//time[@class='css-17ubb9w']")
dates = [x.text for x in dates_element]
print("dates:")
print(dates, '\n')
for headlines, dates in zip(headlines, dates):
print("Headlines : Dates")
print(headlines + ": " + dates, '\n')
But when I run the script the show more button clicks it a few times and then randomly clicks on one of the article and moves away. I also tried nesting the headline and date scraping inside of the While loop but I just kept getting a "TabError: inconsistent use of tabs and spaces in indentation"
Please Help! Thanks!