Why doesn't this XML selector get the right data from the website I'm trying to scrape?

Question

I'm trying to scrape this website

http://www.gramfeed.com/instagram/tags#Andorra

and am trying to get all the data from the posts. This is what I'm trying but unfortunately posts isn't getting the list of all posts. Any idea what I'm doing wrong? Thanks!

class GramfeedSpider(Spider):
name = "gramfeed"
allowed_domains = ["gramfeed.com"]
start_urls = ["http://www.gramfeed.com/instagram/tags#Andorra"]

def parse(self, response):
    """
    The lines below is a spider contract. For more info see:
    http://doc.scrapy.org/en/latest/topics/contracts.html

    @url http://www.gramfeed.com/instagram/tags#Andorra
    @scrapes name 
    """
    sel = Selector(response)
    posts = sel.xpath('//div[@id="content"]/div')
    #posts = sel.xpath('//div[@id="content"]/div[@class="grid-cell"]')
    #posts = sel.xpath('//div[@id="content"]/div[@onclick="showPhoto(0)"]')
    print "@@@@@@"
    print posts
    print "@@@@@@"

How many posts do you get now and what is your desired post count? Thanks. — alecxe
Hi, when I print posts I just get "[<Selector xpath='//div[@id="content"]/div' data=u'<div class="text"><br><br><b>gramfeed</b'>]" I would like to get as many posts as I can. — john2131

alecxe alecxe · Accepted Answer · 2016-02-11T22:04:59

This is quite a dynamic web-page, the results are asynchronously loaded and you need a Javascript engine to execute JavaScript on this page. You should see if you can solve it with scrapy-splash middleware or selenium.

Why doesn't this XML selector get the right data from the website I'm trying to scrape?

1 Answers