0
votes
<div id="thumbnailsImagePreview">
     <img src="getImage.do?imageSize=Small&amp;imageId=730645&amp;r=150521020" imageindex="0" hspace="0" vspace="0" loaded="false" class="selected">
     <img src="getImage.do?imageSize=Small&amp;imageId=7589956&amp;r=150521020" imageindex="1" hspace="0" vspace="0" loaded="false">
     <img src="getImage.do?imageSize=Small&amp;imageId=7590018&amp;r=150521020" imageindex="2" hspace="0" vspace="0" loaded="false">
     <img src="getImage.do?imageSize=Small&amp;imageId=2803850&amp;r=150521020" imageindex="3" hspace="0" vspace="0" loaded="false">
     <img src="getImage.do?imageSize=Small&amp;imageId=2973197&amp;r=150521020" imageindex="4" hspace="0" vspace="0" loaded="false">
     <img src="getImage.do?imageSize=Small&amp;imageId=7589888&amp;r=150521020" imageindex="5" hspace="0" vspace="0" loaded="false">
     <img src="getImage.do?imageSize=Small&amp;imageId=7877267&amp;r=150521020" imageindex="6" hspace="0" vspace="0" loaded="false">
     <img src="getImage.do?imageSize=Small&amp;imageId=7877375&amp;r=150521020" imageindex="7" hspace="0" vspace="0" loaded="false">
     <img src="getImage.do?imageSize=Small&amp;imageId=6812892&amp;r=150521020" imageindex="8" hspace="0" vspace="0" loaded="false">

</div>

I am trying to extract the links to the img src (for the links that have an associated imageIndex) within this HTML, but since they are all held within the div id "thumbnailsImagePreview", when I use the following line of code, I get one big block of text and so I am unable to parse it for each of the img src links.

images = soup.find_all('div', attrs = {'id' : 'thumbnailsImagePreview'})

How do I get an array of the links?

When I print out images, this is what I get:

[<div id="thumbnailsImagePreview">\n<img class="selected" hspace="0" 
imageindex="0" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=730645&amp;r=150521020" vspace="0"/>\n<img 
hspace="0" imageindex="1" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=7589956&amp;r=150521020" vspace="0"/>\n<img 
hspace="0" imageindex="2" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=7590018&amp;r=150521020" vspace="0"/>\n<img 
hspace="0" imageindex="3" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=2803850&amp;r=150521020" vspace="0"/>\n<img 
hspace="0" imageindex="4" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=2973197&amp;r=150521020" vspace="0"/>\n<img 
hspace="0" imageindex="5" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=7589888&amp;r=150521020" vspace="0"/>\n<img 
hspace="0" imageindex="6" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=7877267&amp;r=150521020" vspace="0"/>\n<img 
hspace="0" imageindex="7" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=7877375&amp;r=150521020" vspace="0"/>\n<img 
hspace="0" imageindex="8" loaded="false" src="getImage.do?
imageSize=Small&amp;imageId=6812892&amp;r=150521020" vspace="0"/>\n<img 
hspace="0" imageindex="9" loaded="false" 
</div>]
1

1 Answers

2
votes

You need to locate inner img elements and get src attribute values by treating each element as a dictionary:

image_srcs = [img['src'] for img in soup.select('#thumbnailsImagePreview img[src]')]

#thumbnailsImagePreview img[src] here is a CSS selector which would find all img elements having src attributes located under an element with id="thumbnailsImagePreview".