5
votes

I am looking to scrape data from this site's mma data and parsing a few highcharts tables. I am clicking a link with selenium and then switching to the chart. I go to this site and click on +420 in the Artem Lobov row for the Pinnacle column. This creates a pop out chart. Then I switch to the active element. I would like to capture the graph drawn by highcharts in response to the click.

I use selenium in the following manner:

actions = ActionChains(driver)
actions.move_to_element(driver.find_element_by_id(pin_id))
actions.click()
actions.perform()
time.sleep(3)
driver.switch_to_active_element()

I was able to click the link and get the chart but I am a bit lost on how highcharts works.
I am trying to parse highcharts-series-group here and get the values in the chart.

I believe the data can be found by:

soup = bs4.BeautifulSoup(open(driver.page_source), "lxml")
data = soup.find_all('g', {"class":"highcharts-series-group"})[-1].find_all("path")

However this provides the following and it it is not clear how a chart is created from the data. As noted in the comments, it appears to be svg.

During inspection the data appears to be in <g class="highcharts-series" and <g class="highcharts-series-tracker but its not clear highcharts graphs it from this data.

How does highcharts display the graph from data saved? Is there a clean way to get the data from the highcharts-series-group as displayed?

3
It is looking like they are storing the data in the dom directly. If you inspect the chart you will see a div with all the data in it as an object you can pull out. The div ID is "even-swing-container". If you want to extract the HTML table of the betting lines that is another question altogether.wergeld
Thank you very much for responding. I was trying to parse the path that i believe is here {"class":"highcharts-series-group"}. It seems to be calling Translate()Michael WS
Take for example click on over/under on bestfightodds.com/events/… when I inspect in firefox/chrome, it looks like the data is in highcharts-series <g class="highcharts-series" and <g class="highcharts-series-tracker but its not clear how its translated and written.Michael WS
Paths rendered by Highcharts use SVG coordinates, not real values. In short: data in JS -> translation in JS from values to SVG coordinates -> rendering SVG elements. In other words, it's not an easy task to get the real data from just SVG coordinates. The easiest way to get this data would be to use.. Highcharts.charts[index], like this: Highcharts.charts[0].series[0].options.data. I guess Selenium won't allow this.Paweł Fus

3 Answers

6
votes

I could not figure out how to convert SVG data into what is displayed on the graph you mentioned, but wrote the following Selenium Python script:

from selenium import webdriver
import time

driver = webdriver.Chrome()
driver.get('https://www.bestfightodds.com/events/ufc-fight-night-108-swanson-vs-lobov-1258')
actions = webdriver.ActionChains(driver)
actions.move_to_element(driver.find_element_by_id('oID1013467091'))
actions.click()
actions.perform()
time.sleep(3)
driver.switch_to_active_element()
chart_number = driver.find_element_by_id('chart-area').get_attribute('data-highcharts-chart')
chart_data = driver.execute_script('return Highcharts.charts[' + chart_number + '].series[0].options.data')
for point in chart_data:
    e = driver.execute_script('return oneDecToML('+ str(point.get('y')) + ')')
    print(point.get('x'), e)

Here we are using Highcharts API and some js from the page sources, that converts server response for this chart to what we see on a graph.

1
votes

Reconstructing data from the svg data list described above using the linear equation y = mx + b from the highcharts chart is another method. If actual data values are known, and datapoints are often displayed on highcharts charts, the slope can be calculated very accurately. Given the intercept is known (see below) I ran a regression on 3 known points and it calculated them precisely (zero error).

Another method described in detail here is reconstructing the data from the highcharts-yaxis-labels but the suitability depends on the data and required accuracy. Extract the y and text values as x and y respectively and run a regression analysis.

y="148"... >-125<
y="117"... >+100<
y="85"... >+120<
y="54"... >+140<
y="23"... >+160<

It is useful to plot the values in a chart, especially with this case because the relationship is not linear. Fortunately discarding the -125 value gives a nice straight line and none of the values are less than 100.

x   y
117 100
85  120
54  140
23  160

x           -0.638938504720592
R^2         0.999938759887726

The bottom x is the line slope so m= -0.638938504720592.

What about the intercept? The most common coordinate system has a bottom left origin but svg uses a top left coordinate system. This means the intercept will have to be adjusted to the top of the chart. The easiest way given this dataset has a value for the top of the chart is to just use the top y as b = 160.

Extract the data list using your preferred method (not described in this answer) and reconstruct the data with the linear equation.

eg ...L 999999 101 ....

y = -0.638938504720592 * 101 + 160 = 95

Reconstructing the data from the y-axis may not be as accurate as using the actual data. If you are lucky the yaxis-labels scale will have a nice scale so you get precise values but it can be up to half a unit out on the top and bottom of the range, so (1/2 + 1/2) / 94 = 1.06% in this example but the error is likely much less.

0
votes

When I use the CSS selector "g.highcharts-axis-labels tspan" it returns all the fighter's names and when I use "g.highcharts-data-labels tspan" it returns all the percents for line movement.

So you should be able to use something like

labels = driver.find_elements_by_css_selector("g.highcharts-axis-labels tspan")
data = driver.find_elements_by_css_selector("g.highcharts-data-labels tspan")
for i in range(0, len(labels) - 1)
    print("Fighter: " + labels[i] + " (" + data[i] + ")")

An alternative is to use the command that Pawel Fus recommended,

Highcharts.charts[0].series[0].options.data

You should be able to execute that using JSE and it returns an array of arrays. You can then parse through that and get the data you want. It's up to you...