I'm having trouble using Beautiful Soup 4 to extract contents from a number of html files which are stored in span tags
I've used the soupstrainer and find("dl") to reduce the html to the repeated items with the "dl" tag and then find all the spans.
My problem is how to extract the correct value from each span and store in a variable and also to handle the ordering of the
<span class="iconYes">Public</span>
<span class="iconNo">Private</span>
so I know the services they offer
My Python 3 code
WebText=BeautifulSoup(open(fileToProcess),"html.parser",parse_only=DentistStrainer)
datalist = WebText.find("dl")
for listitems in datalist:
spans = datalist.find_all('span')
for span in spans:
print(span)
Sample Output
<span id="Content_Result_lblDentistName">Dr First Surname</span>
<span class="lblAddress" id="Content_Result_lblAddress"><strong>Address</strong>: Dental Centre, Street, Town</span>
<span class="lblAddress" id="Content_Result_lblPhone"><strong>Phone</strong>: 123-1234567</span>
<span class="lblAddress" id="Content_Result_lblFax"><strong>Fax</strong>: 123-3456789</span>
<span class="lblAddress" id="Content_Result_lblEmail">[email protected]</span>
<span class="lblAddress" id="Content_Result_lblWebsiteUrl">www.somewhere.tld</span>
<span><strong>Services</strong>: </span>
<span class="iconYes">Private</span>
<span class="iconYes">Public</span>
<span class="iconNo">Credit Card</span>
I unsuccessfully tried to extract the values using
if span.contains("lblDentistName"):
DentistName = span.text()
print("Dentist ",DentistName)`
Can any Beautifulsoup users help me ?