2
votes

Objective: Scraping the text data from the div class list_area daily_all.

Using the scrapy shell, I have first "scraped" the website I want to scrape data from: https://comic.naver.com/webtoon/weekday.nhn

Using the 'scrapy shell' script: scrapy shell 'https://comic.naver.com/webtoon/weekday.nhn'

And using xpath, I want to scrape all the text data from the div class "list_area daily_all":

response.xpath("//div[@id='wrap']/div[@id='container']/div[@class='list_area daily_all']/text()")

However, the above code does not return anything. What am I doing wrong?

1

1 Answers

1
votes

Since div[@id='container'] and div[@class='list_area daily_all'] are not parent and child, you will not get object.

Add one more slash and you will succeed:

In [1]: response.xpath("//div[@id='wrap']/div[@id='container']//div[@class='list_area daily_all']")
Out[1]: [<Selector xpath="//div[@id='wrap']/div[@id='container']//div[@class='list_area daily_all']" data=u'<div class="list_area daily_all">\r\n     '>]

Same for text. You choose very big block and it has a lot of tags and text in them. You can select all texts like here:

In [2]: response.xpath("//div[@id='wrap']/div[@id='container']//div[@class='list_area daily_all']//text()")
Out[2]: 
[<Selector xpath="//div[@id='wrap']/div[@id='container']//div[@class='list_area daily_all']//text()" data=u'\r\n                \r\n\t\t\t\t'>,
 <Selector xpath="//div[@id='wrap']/div[@id='container']//div[@class='list_area daily_all']//text()" data=u'\r\n\t\t\t\t\t'>,
...

Or be more precise in your selector.