0
votes

I call Jsoup.connect(url).get() a http document. i also do call doc.select("img[src]"), but it return empty. Now I found the problem. for some div tags were not static.thery were dynamic generated. When some ajax post was done. the div tags token were writen to the body。 The following div tags were not exist in doc after" doc = Jsoup.connect(url).get()" .

<div class="w clear con-page">
    <div class="article_nav" >
        <a href="index.html">Home</a>&nbsp;&gt;&nbsp;<a href="list.html">car size rate </a>&gt;&nbsp; 
    </div>
    <div id="article_content" class="article article_content" style="min-height: 400px;">
        <div class="article_title"> <p>ARTICLE:2021-04-09</div>
        <div class="article_main" align="center">
            <p ><img width="600" title="1617952699745078083.jpg" alt="1617952602(1).jpg" src="http://www.chinaisa.org.cn/gxportalFile/image/2021/04/09/1617952699745078083.jpg"></p>
        </div>
    </div>
</div>

I want to get all image src from html page by Jsoup.Now i faced a problem which doc.select("img") return nothing. I guess the img tag was in as follows. Jsoup can use xpath to get the img. Is there any method to get all img tags?

div(w clear con-page)
--div(article_content)
  --div(article_main)
    --p
      --img

1

1 Answers

0
votes

It works for me.

    String html = "<div class=\"w clear con-page\">\r\n"
        + "    <div class=\"article_nav\" >\r\n"
        + "        <a href=\"index.html\">Home</a>&nbsp;&gt;&nbsp;<a href=\"list.html\">car size rate </a>&gt;&nbsp; \r\n"
        + "    </div>\r\n"
        + "    <div id=\"article_content\" class=\"article article_content\" style=\"min-height: 400px;\">\r\n"
        + "        <div class=\"article_title\"> <p>ARTICLE:2021-04-09</div>\r\n"
        + "        <div class=\"article_main\" align=\"center\">\r\n"
        + "            <p ><img width=\"600\" title=\"1617952699745078083.jpg\" alt=\"1617952602(1).jpg\" src=\"http://www.chinaisa.org.cn/gxportalFile/image/2021/04/09/1617952699745078083.jpg\"></p>\r\n"
        + "        </div>\r\n"
        + "    </div>\r\n"
        + "</div>";
    Document doc = Jsoup.parse(html);
    Elements es = doc.select("img[src]");
    for (Element e : es)
        System.out.println(e.attr("src"));

output:

http://www.chinaisa.org.cn/gxportalFile/image/2021/04/09/1617952699745078083.jpg