0
votes

I'm trying to run scrapy spiders from a django project when the user makes a request so I'm currently testing the code from the scrapy docs for running a spider from a script. To test out how to import the spider into the django project, I added a file to the django project in the same directory where I placed the scrapy spider (i.e. where the urls, settings, and wsgi files are). When I try to import the function to run the crawler process from the spiders file, I get an import error. This is the statement I used:

from trydjango18.ticket_city_scraper.ticket_city_scraper.ticket_city_scraper.spiders.tc_spiders import spiderCrawl

This might seem vague so I have a screenshot of the file path below. What would be the proper way to import the spider.py file?

filepath with scrapy spiders enter image description here

filepath with test file enter image description here

UPDATE I was able to get the the spider to run from the script; however, I now am getting another import error from within the spiders file for the items module. I think this is most likely due to the fact that only the path for the spiders.py is being added into the script but not the other necessary modules. These are the statements I used (as well as the rest of the code from the script):

import imp
tc_spider = imp.load_source('tc_spider', '/home/elijah/Desktop/trydjango18/src2/trydjango18/trydjango18/ticket_city_scraper/ticket_city_scraper/spiders/tc_spider.py')  


bandname = raw_input("Enter bandname")
tc_spider.spiderCrawl(bandname)
1
Note that using imp.load_source('tc_spider', '/home/elijah/Desktop/trydjango18/src2/trydjango18/trydjango18/ticket_city_scraper/ticket_city_scraper/spiders/tc_spider.py') makes your script environment-dependent. - Ernest Ten
@ErnestTen How will it become environment independent - loremIpsum1771
You shouldn't use anything above your project's directory. - Ernest Ten

1 Answers

1
votes

As I can see there are 2 errors:

  1. There is extra ticket_city_scraper in your path.

    Should be:

    from trydjango18.ticket_city_scraper.ticket_city_scraper.spiders.tc_spiders import spiderCrawl
    
  2. There is no tc_spiders.py.

    Either add tc_spiders.py or import from tc_spider.py.

Considering that you asked about

the spider.py file

I assume that you meant tc_spider.py, thus a complete solution is:

from trydjango18.ticket_city_scraper.ticket_city_scraper.spiders.tc_spider import spiderCrawl

Also make sure that:

  1. Each package contains __init__.py.
  2. tc_spiders.py/tc_spider.py contains module-level function spiderCrawl.