0
votes

The website url is https://www.justia.com/lawyers/criminal-law/maine

I'm wanting to scrape only the name of the lawyer and where their office is.

response = requests.get(url)
soup= BeautifulSoup(response.text,"html.parser")
Lawyer_name= soup.find_all("a","url main-profile-link")
 for i in Lawyer_name:
     print(i.find(text=True))
address= soup.find_all("span","-address -hide-landscape-tablet")
for x in address:
    print(x.find_all(text=True))

The name prints out just find but the address is printing off with extra that I want to remove:

['\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t88 Hammond Street', '\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tBangor,\t\t\t\t\tME 04401\t\t\t\t\t\t    ']

so the output I'm attempting to get for each lawyer is like this (the 1st one example):

Hunter J Tzovarras
88 Hammond Street
Bangor, ME 04401

two issues I'm trying to figure out

  1. How can I clean up the address so it is easier to read?
  1. How can I save the matching lawyer name with the address so they don't get mixed up.
1

1 Answers

1
votes

Use x.get_text() instead of x.find_all

for x in address:
    print(x.get_text(strip=True))