I am trying to scrape a website with multiple 'p' tags with beautifulsoup and I find it very difficult.
I want to get all posts associated with p tags.
The find_all
on beautifulsoup will not get this done and the image is not saving I get an error that the file cannot be saved and tell me how to retrieve, add or scrape all the text in p tags and the image on the HTML code below.
my code
kompas = requests.get('https://url_on_html.com/')
beautify = BeautifulSoup(kompas.content,'html5lib')
news = beautify.find_all('div', {'class','jeg_block_container'})
arti = []
for each in news:
title = each.find('h3', {'class','jeg_post_title'}).text
lnk = each.a.get('href')
r = requests.get(lnk)
soup = BeautifulSoup(r.text,'html5lib')
content = soup.find('p').text.strip()
images = soup.find_all('img')
arti.append({
'Headline': title,
'Link': lnk,
'image': 'images'
})
let's take this HTML code as a scraping sample
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p>Once upon a time there were three little sisters, and their names were and they lived at the bottom of a well.</p>
<script></script>
<p>the emergency of our matter is Once upon a time there were three little sisters, and their names were and they lived at the bottom of a well.</p>
<p> we will not once in Once upon a time there were three little sisters, and their names were and they lived at the bottom of a well.
</p>
<script></script>
<br></br>
<script></script>
<p>king of our Once upon a time there were three little sisters, and their names were and they lived at the bottom of a well.</p>
<script></script>
<img src="image.png">
<p>he is our Once upon a time there were three little sisters, and their names were and they lived at the bottom of a well.</p>
<p>some weas Once upon a time there were three little sisters, and their names were and they lived at the bottom of a well.</p>
I want to filter and scrape all the 'p' tags and add them to my content.
The issue is find_all attribute on beautifulsoup can not retrieve this. The find all attributes will just scrape the first line of the p element or elements.