1
votes

I have a Python code that scraps different data. For example, it scraps the Website from this HTML code:

<a data-ix="show-popup-on-click" target="_blank" rel="nofollow" href="https://mylink.org/" class="button full w-button" style="transition: all 0.4s ease 0s;">Website</a>

It was working properly, but now it fails with the error:

NoSuchElementException: Message: {"errorMessage":"Unable to find element with link text 'Website'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"95","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:40581","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"link text\", \"sessionId\": \"a7a441f0-0f6a-11e8-ad3a-6121f74a30f4\", \"value\": \"Website\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/a7a441f0-0f6a-11e8-ad3a-6121f74a30f4/element"}} Screenshot: available via screen

This is my code:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
driver.get(link)
driver.implicitly_wait(10)

website = driver.find_element_by_link_text("Website").get_attribute("href")

What am I doing wrong?

UPDATE:

<div class="column-space w-col w-col-4">
   <a data-ix="show-popup-on-click" target="_blank" 
      rel="nofollow" href="https://example.com/" 
      class="button full w-button" 
      style="transition: all 0.4s ease 0s;">Website</a>

   <div class="space big"></div>
   <a target="_blank" rel="nofollow" 
      href="https://example.com/storage/b/2/0/2/WhitepaperLive.pdf" 
      class="button-2 w-button">Whitepaper</a>
   <div class="space big"></div>
   <a class="button-2 w-condition-invisible w-button">Program</a>
   <div class="space big w-condition-invisible"></div>
   <div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Token:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">UTC</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Price:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">1 LUC=0,05 USD</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Buy with:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">USD, EUR</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Platform:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">MyPlatform</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix w-condition-invisible">
         <div class="div-block-2">KYC:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">No</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">KYC:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">Yes</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Location:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">Malta</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Can't join:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">USA</div>
         </div>
      </div>
      <div class="space big"></div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Start:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">January 25, 2018</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">End:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">February 5, 2018</div>
         </div>
      </div>
      <div class="space big"></div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Start2:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">February 12, 2018</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">End2:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">March 5, 2018</div>
         </div>
      </div>
      <div>
         <div class="div-block-33">
            <div class="space big"></div>
            <div>
               <a target="_blank" rel="nofollow" 
               class="button green full w-condition-invisible w-button">JOIN WHITELIST NOW ยป</a>
               <div class="div-block-34">
                  <a target="_blank" rel="nofollow" href="http://we-do-not-have-slack.com" 
                     class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/903_slack-symbol.png" alt="ICO Slack link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://twitter.com/live" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/f4000142b091_twitter%20(1).png" width="16" alt="ICO Twitter link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://t.me/live" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/790001798dfe_telegram.png" alt="ICO Telegram link">
                  </a>
                  <a target="_blank" rel="nofollow" href="http://we-do-not-have-GitHub.com" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b26a_github-logo.png" alt="ICO GitHun link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://www.facebook.com/Play2Live-504880049864038/" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b117/59d510290116ac0001964c8e_facebook.png" alt="Facebook link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://talk.org/index.php?topic=2381679.0" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/0011f8c3c_talk.jpg" alt="Talk link">
                  </a>
               </div>
            </div>
         </div>
      </div>
   </div>
</div>
2
Can you share more of the HTML? And possibly even the text of a parent element that you can successfully find_*? - Ian Lesperance
@Ian: yes, other find_* work well. - Markus
@Ian: I uploaded whole HTML code. - Markus
What is the nearest ancestor element that you can successfully find with Selenium? And what is that element's text value? - Ian Lesperance

2 Answers

1
votes

This error occurs when Selenium can't find the object in the HTML DOM.

My guess is that you set up your implicit wait too late, and Selenium tries to get the Element before the page is loaded and the element present in the HTML DOM.

driver.get(link)
driver.implicitly_wait(10)

The documentation sets up the implicit wait before getting any pages:

driver = webdriver.PhantomJS()
driver.implicitly_wait(10)
driver.get(link)

This ensures that selenium waits until the page is fully loaded before it looks for the anchor tag element.

DocLink: http://selenium-python.readthedocs.io/waits.html#implicit-waits

Also if there are no elements on that page you are scraping that are loaded or created via javascript, then you don't need selenium to do simple text extraction scraping. You could just use the core library urllib.request to get the page and then scrape with beautifulSoup.

UPDATE:

As Ian in said in the comments, implicit wait positioning doesn't matter in this case.

The Problem was the Locator Strategy.

website = driver.find_element_by_link_text('Website').get_attribute('href')

In this case it couldn't find the element, which is a Link styled to a button with uppercase lettering WEBSITE. It seems to match not the link text in the HTML DOM ("Website") but the css computed style rendered text WEBSITE on the button.

Another locator strategy like css-selector or XPATH seems to me to deliver more reliable results:

driver.find_element_by_xpath("//a[contains(text(),'Website')]").get_attribute("href")

Some more information on those can be found here: Selenium Locating Elements

1
votes

There is no problem in the code , on inspecting the Websitelink from web page i can see the text as "Website" but if i use the same text to find the element by link text like below i am getting NoSuchElementException

website = driver.find_element_by_link_text("Website").get_attribute("href")
print(website)

I have tried giving 'waits' and used partial_link_text also but no luck.

Then i tried fetching all the element of tag name "a" and print the text from those with the below code.

elements = driver.find_elements_by_tag_name("a")
for element in elements:
    print(element.text)

Later i got to know its not the "Website" its "WEBSITE". But i am not sure why its behaving like this.

After changing the all characters od website to capital i am able to identify the element and fetch the href from that.

driver.get("https://topicolist.com/ico/adhive")
website = driver.find_element_by_link_text("WEBSITE").get_attribute("href")
print(website)

Hope its solves your problem.