Trying to scrape Transaction Value 取引値 from the url http://nextfunds.jp/lineup/1357/detail.html . If I use inspect element , I am able to see the value 1,875. (You can ctrl+f取引値 or 1,875 to see what value I need). But I dont see these values in the source code.
My in tent is to scrape through python. I tried using
import requests
url='http://nextfunds.jp/lineup/1357/detail.html'
response = requests.get(url)
html = response.content
print html
soup = BeautifulSoup(html)
Since 1,875 or 取引値 are not in the html source code, would there be now way to scrape those values ? Thanks
Update 1: Tried lxml
from lxml import html
page = requests.get(url)
tree=html.fromstring(page.content)
#copied xpath using chrome inspect element
val= tree.xpath('//*[@id="include"]/div[1]/div[2]/table/tbody/tr[1]/td')
val
[]
Update 2: Tried Webkit (comes very close to being solved), using this link https://impythonist.wordpress.com/2015/01/06/ultimate-guide-for-scraping-javascript-rendered-web-pages/
import sys
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import *
from lxml import html
#Take this class for granted.Just use result of rendering.
class Render(QWebPage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def _loadFinished(self, result):
self.frame = self.mainFrame()
self.app.quit()
url = 'http://nextfunds.jp/lineup/1357/detail.html'
r = Render(url)
result = r.frame.toHtml()
#now print result in a file and open it in browser to copy xpath of the desired table data
#but somehow some table values are missing (I thought it was a website issue but no !)
Update 3 ( got the values ! , stuck at selecting table)
>>> import dryscrape
>>> from bs4 import BeautifulSoup
>>> session = dryscrape.Session()
>>> session.visit(url)
>>> response = session.body()
>>> soup = BeautifulSoup(response)
>>> html = soup.prettify("utf-8")
>>> f1.write(html)
#Now I do see my required table values, but beautifulesoup doesnt let use xpath, I just need to select the table and save it as csv
Update 4
I found that the html I am interested in is given in the pagesource of the url. I only need to search for pattern src="http://nam.qri.jp/cgi-bin/nextfunds/json?SRC=nextfunds/lineup&code=1570&auth= in the page source to get the link. and then use the code given in the answer section. This is more of a regex problem now. I can do it using 'curlandgrep' but would like to it in python only.