I'm trying to use Python and Beautiful soup to open a link and extract data that is embedded within a tag. I've tried to do this but exhausted my knowledge.
Here are the portions of my code and what the text looks like that I am trying to grab the data from
sauce = urllib.request.urlopen(link).read() #link is the url
soup = BeautifulSoup(sauce,'lxml')
yy = soup.select('span[id^=ctl00_ContentPlaceHolder1_Label1]')
y = yy[0]
print(y)
print(y) results in the following data:
'<span id="ctl00_ContentPlaceHolder1_Label1"><div style="width:100%;clear:both;overflow:hidden;">\
<div style="width:17%;float:left;margin-right:10px;"><span style="font-size:16px;font-weight:bold;"> \
Licensee:</span></div><div style="float:left;"><span style="font-size:14px;font-weight:bold;">Company, INC.</span></div></div><div \
style="width:100%;clear:both;overflow: hidden;"><div style="width:17%;float:left;margin-right:10px;"> \
<span style="font-size:16px;font-weight:bold;">Facility:</span></div><div style="float:left;"> \
<span style="font-size:14px;font-weight:bold;">Joes Shop</span></div></div><br/><b>Status:</b> \
Licensed<br/><b>JOE SMITH - Director</b><br/><b>Phone:</b> (555)555-5555<br/> <span style="font-size:8pt"><table \
border="1" style="padding:1px 1px 5px 1px;border:1px solid #999999;width:497px;border-collapse:collapse;"><tr><td \
width="50%"><b>Daytime Hours:</b> 07:30 AM - 03:30 PM</td><td width="50%"><b>Nighttime Hours:</b> \
N/A - N/A</td></tr><tr><td width="50%"><b>Daytime Ages:</b> 4 YRS Through 5 YRS</td><td width="50%"><b> \
Nighttime Ages:</b> N/A</td></tr></table></span><br/><span style="font-size:12px;font-weight:bold;"> \
Mailing Address:</span><br/><span style="font-size:12px;">1909 CENTRAL PARK</span><br/> \
<span style="font-size:12px;">NEW YORK</span>, <span style="font-size:12px;">NY</span> \
<span style="font-size:12px;">58756</span><br/><br/><span style="font-size:12px;font-weight:bold;"> \
Street Address:</span><br/><span style="font-size:12px;">3996 Rhode Ave</span><br/> \
<span style="font-size:12px;">Cleveland</span>, <span style="font-size:12px;">OH</span> <span style="font-size:12px;">58475</span></span>'
I've tried:
ystring = y.getText(separator=u' ')
But this only left me with all the text and titles and all I want is the actual name, phone number, address, etc.
Specifically, I'm trying to extract from this the following: Licensee (Company, Inc), Facility (Joes Shop), Status (Licensed), Director (Joe Smith), Phone ((555) 555-5555), Daytime Hours (07:30 AM - 03:30 PM), Nighttime Hours (N/A - N/A), Daytime Ages (4 YRS Through 5 YRS), Nighttime Ages (N/A), Mailing Address (1909 Central Park, New York, NY, 58756 (separate Street, City, State, zip by commas, and Street Address (3996 Rhode Ave, Cleveland, OH 58475))
Any thoughts or suggestions are greatly appreciated.