0
votes

for this page but I can only receive very few tags, is this site dynamic in which case I should probably run a script to see the data? then I would like to extract the values ​​from the chart, this site displays the water level of my city, I tried this but it returns me nothing, so to speak, but in the dev tools of chrome I see everything, why? thanks in advance for your help!

here : the site whith epoch.. http://aqualim.environnement.wallonie.be/Station.do?method=selectStation&time=1642669254241&station=L7880

here : the code i try and response

URL = "http://aqualim.environnement.wallonie.be/Station.do?method=selectStation&time=1642669254241&station=L7880" page = requests.get(URL) soup = BeautifulSoup(page.content, "html.parser") print(soup)

2
I think you should get something like you see after opening this link: view-source:aqualim.environnement.wallonie.be/Station.do?method=selectStation&time=1642669254241&station=L7880. If you don't see your data there, it's likely loaded later, by all those scripts. A login page that loads before data page is even less and contains almost nothing. If you open the page in browser and use Network tab of developer tools before you reload the normal view of the page, you will see all of the resources it's loading. Your data is somewhere in there.Eugene Ryabtsev
Thanks , yes i see all scripts in Network tab but how a can render or decode ? juste i simple exemple can help me to find the logic , thanks againisoparme

2 Answers

0
votes

The item that you want to scrape is javascript render. Module request only receives static html. You can use puppeteer to scrap everything that you see in developer.

0
votes

As stated, the request returns the static html. This data is loaded dynamically.

You could use something like puppeteer or Selenium to allow the page to render first, then you can pull and parse the html. Or, you can get the data directly in a nice json format here.

I'm not sure what data you want exactly.

import pandas as pd
import requests

url = 'http://geoservices2.wallonie.be/arcgis/rest/services/APP_AQUALIM/STATION_PUBLIC/MapServer/0/query'
payload = {
'f': 'json',
'where': '1=1',
'returnGeometry': 'true',
'spatialRel': 'esriSpatialRelIntersects',
'outFields': '*',
'outSR': '31370'}

jsonData = requests.get(url, params=payload).json()

df = pd.json_normalize(jsonData['features'])

Output:

print(df)
    attributes.NOMSTA attributes.LOCALITE  ... geometry.x  geometry.y
0               L5021           Resteigne  ...   207730.0     86925.0
1               L5060           Romedenne  ...   174509.0     94935.0
2               L5170            Baisieux  ...   101819.0    119540.0
3               L5183                Onoz  ...   171179.0    130329.0
4               L5201             Rhisnes  ...   183172.0    130635.0
..                ...                 ...  ...        ...         ...
175             L8640              Anthée  ...   176992.0    103682.0
176             L8650               Gozin  ...   194984.0     90490.0
177             T0025    Faulx les Tombes  ...   195622.0    125555.0
178             T0054           Pépinster  ...   251109.0    140666.0
179             T0055  Trooz (temporaire)  ...   243552.0    140836.0

[180 rows x 8 columns]

To filter:

df_7880 = df[df['attributes.NOMSTA']=='L7880']

Output:

print(df_7880.to_string())
    attributes.NOMSTA attributes.LOCALITE attributes.RIVIERE  attributes.X_LAMBERT  attributes.Y_LAMBERT  attributes.ESRI_OID  geometry.x  geometry.y
152             L7880                 Ere    Rieu des Barges               79114.0              141573.0                  153     79114.0    141573.0