How can I scrape pages with dynamic content using node.js?

Question

I am trying to scrape a website but I don't get some of the elements, because these elements are dynamically created.

I use the cheerio in node.js and My code is below.

var request = require('request');
var cheerio = require('cheerio');
var url = "http://www.bdtong.co.kr/index.php?c_category=C02";

request(url, function (err, res, html) {
    var $ = cheerio.load(html);
    $('.listMain > li').each(function () {
        console.log($(this).find('a').attr('href'));
    });
});

This code returns empty response, because when the page is loaded, the <ul id="store_list" class="listMain"> is empty.

The content has not been appended yet.

How can I get these elements using node.js? How can I scrape pages with dynamic content?

use phantom.js a headless browser, it will load and render the page. you can access different elements on the page using its javascript API. — Safi
Thanks Safi! But Could you give me a code snippet or some reference with this case? — JayD
Note that the top answer on this page is from 2015 and recommends an out of date library. Puppeteer and Playwright are the preferred dynamic scraping tools as of 2021, and by the time you're reading this note, there may be other tools that have become state of the art, so please read the entire thread. OP hasn't visited SO since 2016 so I don't anticipate the checkmark changing until site policy does. — ggorlen

Safi Safi · Accepted Answer · 2015-02-27T06:13:48

Here you go;

var phantom = require('phantom');

phantom.create(function (ph) {
  ph.createPage(function (page) {
    var url = "http://www.bdtong.co.kr/index.php?c_category=C02";
    page.open(url, function() {
      page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() {
        page.evaluate(function() {
          $('.listMain > li').each(function () {
            console.log($(this).find('a').attr('href'));
          });
        }, function(){
          ph.exit()
        });
      });
    });
  });
});

How can I scrape pages with dynamic content using node.js?

4 Answers