0
votes

I am trying to extract the number of "confirmados" cases of COVID-19 from this page https://coronavirus.gob.mx/datos/

This is my line of code table_div = soup.find('div', {"id": "gsPosDIV"}) but is not working, I am really neophyte with web scraping. Which is the correct form to extract this data?

This is the html <div id="gsPosDIV" class="h5 mb-0 font-weight-bold text-gray-800">47,144</div

1
Can you post the html code?0m3r
put your code in a setTimeut and make it wait a few seconds, while the page loadspacukluka
yes of course <div id="gsPosDIV" class="h5 mb-0 font-weight-bold text-gray-800">47,144</div>coding

1 Answers

0
votes

The data is loaded dynamically via JavaScript. You can simulate the Javascript requests by requests module and then parse the data with re module:

import re
import requests

data = {'sPatType': 'Confirmados',
'cve': '000',
'nom': 'Nacional'}

url = 'https://coronavirus.gob.mx/datos/Overview/info/getInfo.php'

raw_data = requests.post(url, data=data).text

positivos = re.search(r'document\.getElementById\("gsPosDIV"\)\.innerHTML = \((\d+)', raw_data).group(1)
print(positivos)

Prints:

47144