The website url is https://www.justia.com/lawyers/criminal-law/maine
I'm wanting to scrape only the name of the lawyer and where their office is.
response = requests.get(url)
soup= BeautifulSoup(response.text,"html.parser")
Lawyer_name= soup.find_all("a","url main-profile-link")
for i in Lawyer_name:
print(i.find(text=True))
address= soup.find_all("span","-address -hide-landscape-tablet")
for x in address:
print(x.find_all(text=True))
The name prints out just find but the address is printing off with extra that I want to remove:
['\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t88 Hammond Street', '\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tBangor,\t\t\t\t\tME 04401\t\t\t\t\t\t ']
so the output I'm attempting to get for each lawyer is like this (the 1st one example):
Hunter J Tzovarras
88 Hammond Street
Bangor, ME 04401
two issues I'm trying to figure out
- How can I clean up the address so it is easier to read?
- How can I save the matching lawyer name with the address so they don't get mixed up.