0
votes

I am trying to scrape the website https://investing.com/ to get technical data for any stocks. I would like to get for "Moving Averages:" & "Technical Indicators:" how many buys and how many sells with different periods :

  • 5 hours
  • Daily
  • Weekly

Here is an image to see data I want to get : https://i.ibb.co/mHpM0Yw/Capture-d-e-cran-2019-08-14-a-00-15-45.png

the url is https://investing.com/equities/credit-agricole-technical

When you navigate to the browser, the period is set to "hourly" and you have to click an another period to get the correct data. The DOM is update after an XML request.

I would like to scrape the page after DOM updated.

Mechanize

I have try to scrape with Mechanize and click on "weekly" and get the DOM to scrape it but i got an error

here is my code :

    def mechanize_scraper(url)
      agent = Mechanize.new
      puts agent.user_agent_alias = 'Mac Safari'
      page = agent.get(url)
      link = page.link_with(text: 'Weekly')
      new_page = link.click
    end


    url = "https://investing.com/equities/credit-agricole-technical"
    mechanize_scraper(url)

here is the error :

Mechanize::UnsupportedSchemeError (Mechanize::UnsupportedSchemeError)

When we inspect the DOM, the link has an its attributes "href" = javascript(void);

    <li pairid="407" data-period="week" class="">
      <a href="javascript:void(0);">Weekly</a>
    </li>

So after some tries and lots of google search, I move on "Watir" to try to scrape.

Watir

here is my code :

    def watir_scraper(url)
      Watir.default_timeout = 10
      browser = Watir::Browser.new
      browser.goto(url)       
      link = browser.link(text: /weekly/).click
      pp link
    end

    url = "https://investing.com/equities/credit-agricole-technical"
    watir_scraper(url)

here is the error :

40: from app.rb:47:in `'

39: from app.rb:32:in `watir_scraper'

38: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/watir-6.16.5/lib/watir/elements/element.rb:145:in `click'

37: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/watir-6.16.5/lib/watir/elements/element.rb:789:in `element_call'

36: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/watir-6.16.5/lib/watir/elements/element.rb:154:in `block in click'

35: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/common/element.rb:74:in `click'

34: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/w3c/bridge.rb:371:in `click_element'

33: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/w3c/bridge.rb:567:in `execute'

32: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/bridge.rb:167:in `execute'

31: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/http/common.rb:64:in `call'

30: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/http/default.rb:114:in `request'

29: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/http/common.rb:88:in `create_response'

28: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/http/common.rb:88:in `new'

27: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/response.rb:34:in `initialize'

26: from /Users/remicarette/.rbenv/versions/2.6.3/lib/ruby/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/response.rb:72:in `assert_ok'

25: from 25 libsystem_pthread.dylib 0x00007fff5aaa440d thread_start + 13

24: from 24 libsystem_pthread.dylib 0x00007fff5aaa8249 _pthread_start + 66

23: from 23 libsystem_pthread.dylib 0x00007fff5aaa52eb _pthread_body + 126

22: from 22 chromedriver 0x000000010b434e67 chromedriver + 3673703

21: from 21 chromedriver 0x000000010b416014 chromedriver + 3547156

20: from 20 chromedriver 0x000000010b3e0f07 chromedriver + 3329799

19: from 19 chromedriver 0x000000010b3f91b8 chromedriver + 3428792

18: from 18 chromedriver 0x000000010b3cd069 chromedriver + 3248233

17: from 17 chromedriver 0x000000010b3f86d8 chromedriver + 3426008

16: from 16 chromedriver 0x000000010b3f8940 chromedriver + 3426624

15: from 15 chromedriver 0x000000010b3ecc1f chromedriver + 3378207

14: from 14 chromedriver 0x000000010b0ce8a5 chromedriver + 108709

13: from 13 chromedriver 0x000000010b0cd7e2 chromedriver + 104418

12: from 12 chromedriver 0x000000010b0f1bf3 chromedriver + 252915

11: from 11 chromedriver 0x000000010b0fba37 chromedriver + 293431

10: from 10 chromedriver 0x000000010b0f1c4e chromedriver + 253006

9: from 9 chromedriver 0x000000010b0cfa66 chromedriver + 113254

8: from 8 chromedriver 0x000000010b0f1a72 chromedriver + 252530

7: from 7 chromedriver 0x000000010b0cfe66 chromedriver + 114278

6: from 6 chromedriver 0x000000010b0d63fb chromedriver + 140283

5: from 5 chromedriver 0x000000010b0d71a9 chromedriver + 143785

4: from 4 chromedriver 0x000000010b0d8d19 chromedriver + 150809

3: from 3 chromedriver 0x000000010b0da569 chromedriver + 157033

2: from 2 chromedriver 0x000000010b15fcef chromedriver + 703727

1: from 1 chromedriver 0x000000010b3bf133 chromedriver + 3191091 0x000000010b42f129 chromedriver + 3649833: element click intercepted: Element ... is not clickable at point (544, 704). Other element would receive the click: ... (Selenium::WebDriver::Error::ElementClickInterceptedError) (Session info: chrome=76.0.3809.100)

I hope everything can help you to understand my issue. I would like to know if I can scrape datas with Mechanize or Watir. If not, which tools can do the job ?

Thanks a lot !

3

3 Answers

1
votes

I don't think it's exactly what you're looking for, but it may get you a little closer.

Using an HTTP sniffer, found that the link you're trying to click makes a POST. The response of that POST can be obtained with:

def mechanize_poster(url)
  agent = Mechanize.new
  headers = {
    'X-Requested-With' => 'XMLHttpRequest',
    'User-Agent' => 'Mac Safari',
    'Content-Type' => 'application/x-www-form-urlencoded',
    'Referer' => 'https://www.investing.com/equities/credit-agricole-technical'
  }
  fields = {
    period: 'week',
    viewType: 'normal',
    pairID: '407'
  }
  page = agent.post(url, fields, headers)
  p page
end

I think you'll need to use some Nokogiri to get at the data values.

1
votes

You can do this just with requests and bs4 using POST. Same idea as in other answer I see but use a loop to provide for all values requested. I simply used dev tools to monitor web traffic when clicking 5hr, Daily etc then observed the xhr calls.

import requests
from bs4 import BeautifulSoup as bs

headers = { 'User-Agent': 'Mozilla/5.0',
            'Content-Type': 'application/x-www-form-urlencoded',
            'Referer': 'https://www.investing.com/equities/credit-agricole-technical',
            'X-Requested-With': 'XMLHttpRequest'}

body = {'pairID' : 407, 'period': '', 'viewType' : 'normal'}
periods = {'5hr': 18000, 'Daily': 86400, 'Weekly': 'week'}

with requests.Session() as s:
    for k, v in periods.items():
        body['period'] = v
        r = s.post('https://www.investing.com/instruments/Service/GetTechincalData', data = body, headers = headers)
        soup = bs(r.content, 'lxml')
        for i in soup.select('#techStudiesInnerWrap .summaryTableLine'):
            print(k, ' : ' , ' '.join([j.text for j in i.select('span')]))

Output:

enter image description here

0
votes

The error you are seeing in Watir is coming from webdriver and indicates that if a human tried to click that link, some other element on the page would get clicked instead (because that other element overlaps the link.

Likely the default browser size is small and you are dealing with a 'reactive' design that doesn't scale down well below a given size (common issue)

Try setting the screen size first, to be similar to what you would be using (e.g. 1024x768 or larger) @browser.window.resize_to(1920, 1080)