0
votes

I am trying to scrape the content of the article on this link: https://onlinelibrary.wiley.com/doi/full/10.1111/jvim.15224

I have used Selenium to load the page (both PhantomJS and Firefox), but I cant seem to get the article tag.

This line was to wait for the page to load:

element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CLASS_NAME, "article-section__sub-title section1")))

Alternatively, I also tried to wait for the article tag to load.

However, the driver continues after a couple of secs, but whenever I check the html I got after waiting, the only thing that comes out is the 'head' and 'body' tags - just tags, without their content.

Any idea what I did wrong with getting the page to load and scrape the article tag?

1
Try with headless chrome, phantomjs is been outdated now and headless chrome is way fasterThunderHorn
Which element are you referring by article tag?undetected Selenium

1 Answers

1
votes

To scrape the article tags instead of using presence_of_element_located() you need to use visibility_of_all_elements_located() method and you can use the following solution:

  • Code Block:

    driver.get("https://onlinelibrary.wiley.com/doi/full/10.1111/jvim.15224")
    tags = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h3.article-section__sub-title.section1")))
    for tag in tags:
        print(tag.text)
    
  • Console Output:

    Background
    Objective
    Animals
    Methods
    Results
    Conclusions and Clinical Importance