5
votes

I am very new to "scrapy", i am scrapping a website and in that i had some anchor tags which consists of href attributes with java script SubmitForm functions. When i clicked that javascript function a page is opening from which i need to fetch data.I used Xpath and found href for particular anchor tags but unable to execute that href attribute that contains javascript function. Can anyone tell me how to execute javascript Submit functions of anchor tags in scrapy python.My HTML code is

   <table class="Tbl" cellspacing="2" cellpadding="0" border="0">
     <tbody>
        <tr>
           <td class="TblOddRow">
             <table cellspacing="0" cellpadding="0" border="0">
               <tbody>
                 <tr>
                   <td valign="middle" nowrap="">
                        <a class="Page" alt="Click to view job description" title="Click to view job description" href="javascript:sysSubmitForm('frmSR1');">Accountant&nbsp;</a>
                   </td>
                 </tr>
               </tbody>
             </table>
           </td>
        </tr>
      </tbody>
  </table>                      

And spider code is

class MountSinaiSpider(BaseSpider):
   name = "mountsinai"
   allowed_domains = ["mountsinaicss.igreentree.com"]
   start_urls = [
       "https://mountsinaicss.igreentree.com/css_external/CSSPage_SearchAndBrowseJobs.ASP?T=20120517011617&",
   ]
   def parse(self, response):
       return [FormRequest.from_response(response,
                                        formdata={ "Type":"CSS","SRCH":"Search&nbsp;Jobs","InitURL":"CSSPage_SearchAndBrowseJobs.ASP","RetColsQS":"Requisition.Key¤Requisition.JobTitle¤Requisition.fk_Code_Full_Part¤[Requisition.fk_Code_Full_Part]OLD.Description(sysfk_Code_Full_PartDesc)¤Requisition.fk_Code_Location¤[Requisition.fk_Code_Location]OLD.Description(sysfk_Code_LocationDesc)¤Requisition.fk_Code_Dept¤[Requisition.fk_Code_Dept]OLD.Description(sysfk_Code_DeptDesc)¤Requisition.Req¤","RetColsGR":"Requisition.Key¤Requisition.JobTitle¤Requisition.fk_Code_Full_Part¤[Requisition.fk_Code_Full_Part]OLD.Description(sysfk_Code_Full_PartDesc)¤Requisition.fk_Code_Location¤[Requisition.fk_Code_Location]OLD.Description(sysfk_Code_LocationDesc)¤Requisition.fk_Code_Dept¤[Requisition.fk_Code_Dept]OLD.Description(sysfk_Code_DeptDesc)¤Requisition.Req¤","ResultSort":"" },
                                        callback=self.parse_main_list)]
   def parse_main_list(self, response):
       hxs = HtmlXPathSelector(response)
       firstpage_urls = hxs.select("//table[@class='Tbl']/tr/td/table/tr/td")

   for link in firstpage_urls:
       hrefs = link.select('a/@href').extract()
2
You can check this link snippets.scrapy.org/snippets/23 - Ababneh A

2 Answers

0
votes

Scrapy does not let you “execute javascript Submit functions”. For that you would have to use Splash or a similar alternative that supports interaction with JavaScript. Scrapy works only with the underlying HTML.

What you can do to solve the issue with Scrapy is to figure out how the JavaScript code builds a request, and reproduce such a request with Scrapy.

To figure out what the JavaScript code does, you have two options:

  • Find the definition of sysSubmitForm() in the page JavaScript code, and find out what it does by reading the JavaScript code yourself.

  • Use the Network tab of the Developer Tools of your web browser to watch what request is sent to the server when you trigger that JavaScript code, and inspect the request to figure out how to build a similar request yourself.

-2
votes

Use built-in FormRequest or FormRequest.from_response functions - inside them specify what data to submit to the form.