rules = ( Rule(LinkExtractor( restrict_xpaths='//need_data', deny=deny_urls), callback='parse_info'), Rule(LinkExtractor(allow=r'/need/', deny=deny_urls), follow=True), )
rules
to extract need URLs for scraping, right?
Can I in callback def
get URL we move?
For example.
website - needdata.comRule(LinkExtractor(allow=r'/need/', deny=deny_urls), follow=True),
to extract URL like needdata.com/need/1 , right?
Rule(LinkExtractor(
restrict_xpaths='//need_data',
deny=deny_urls), callback='parse_info'),
to extract urls from needdata.com/need/1 , for example it a table with people.
and then parse_info
to scrape it. Right?
But I want to understand in parse_info
who a parent?
If needdata.com/need/1 has needdata.com/people/1
I want to add to a file column parent
and data will be needdata.com/need/1
How to do that? Thank you very much.