2
votes

I am a newbie to web scraping. And I found that the codes from "view page source" and "inspect element" are different. I am using Chrome. I am wondering is there a way I can extract the code in "inspect element"?

The BeautifulSoup module in python seems to extract code from "view page source" rather than "inspect element".

3

3 Answers

1
votes

True, as user110977 has said: The code in inspect element changes based on the [executed in browser] javascript of a page, that is why it is different. Basically you need a scripting language that will invoke a browser instance with all the javascript-evaluated code. Use any server side language (python, java, php...) that will run Selenium or PhantomJS for that.

In addition to this you might be interested in this picture, of how to copy/paste the browser code.

Update

Can python extract inspect element content line by line?

No. Python is the server-side programming language, not executing any javascript of a scraped page. While the inspect element panel (more correct - browser developer tools) presents a javascript evaluated html code to developers. If you invoke a browser instance thru Selenium (or PyQt), that [virtual] browser will content all the javascript evaluated code. That's where you access the code that you need.

1
votes

The only way to extract code from inspect element is line by line. The code in inspect element changes based on the javascript of the page which is why it is different. The code is also displayed the way the browser interprets the source code. For instance, sometimes incorrectly nested elements will be nested correctly by the browser and shown in the developer tools.

1
votes

To extract data from inspect element we can use selenium (firefox webdriver, chromedriver, PhantomJS) this will resolve the problem with page source is different from inspect element.