24
votes

I'm trying to figure out how to route my requests through an HTTP proxy.

I'm initializing webdriver like this:

user_agent = 'my user agent 1.0'
DesiredCapabilities.PHANTOMJS['phantomjs.page.settings.userAgent'] = user_agent
driver = webdriver.PhantomJS()

I've gone through the docs and the source and can't seem to find a way to use a proxy server with phantomjs for through webdriver.

Any suggestions?

5

5 Answers

73
votes

Below is the example of how to set proxy for PhantomJs in Python. You may change proxy type: socks5/http.

service_args = [
    '--proxy=127.0.0.1:9999',
    '--proxy-type=socks5',
    ]
browser = webdriver.PhantomJS('../path_to/phantomjs',service_args=service_args)
6
votes

I dug a little and I found that the functionality is there, but it is not exposed. So it requires a handy monkey wrench to patch it up. Here is the solution that works for me until this functionality is fully exposed in the webdriver call.

EDIT: it seems the service_args are now exposed, you no longer need to monkey patch selenium to use the proxy ... see @alex-czech answer for how to use.

from selenium import webdriver
from selenium.webdriver.phantomjs.service import Service as PhantomJSService

phantomjs_path = '/usr/lib/node_modules/phantomjs/lib/phantom/bin/phantomjs'
# monkey patch Service temporarily to include desired args
class NewService(PhantomJSService):
    def __init__(self, *args, **kwargs):
        service_args = kwargs.setdefault('service_args', [])
        service_args += [
            '--proxy=localhost:8080',
            '--proxy-type=http',
        ]
        super(NewService, self).__init__(*args, **kwargs)
webdriver.phantomjs.webdriver.Service = NewService
# init the webdriver
self.driver = webdriver.PhantomJS(phantomjs_path)
# undo monkey patch
webdriver.phantomjs.webdriver.Service = PhantomJSService

Also useful are the following settings, especially when using a proxy that may take a very long time to load.

max_wait = 60
self.driver.set_window_size(1024, 768)
self.driver.set_page_load_timeout(max_wait)
self.driver.set_script_timeout(max_wait)
5
votes

The following is how to do the same with the Webdriver in Ruby. I couldn't find this anywhere online until I dug into the source code:

phantomjs_args = [ '--proxy=127.0.0.1:9999', '--proxy-type=socks5']
phantomjs_caps = { "phantomjs.cli.args" => phantomjs_args }
driver = Selenium::WebDriver.for(:phantomjs, :desired_capabilities => phantomjs_caps)
0
votes

I ended up needing to pass the credentials in both the service_args & as a proxy-auth header. I don't believe phantomjs passes the proxy auth onwards correctly.

service_args = [
    "--ignore-ssl-errors=true",
    "--ssl-protocol=any",
    "--proxy={}".format(proxy),
    "--proxy-type=http",
]

caps = DesiredCapabilities.PHANTOMJS

authentication_token = "Basic " + base64.b64encode(b'{}:{}'.format(username, password))

caps['phantomjs.page.customHeaders.Proxy-Authorization'] = authentication_token

self.driver = webdriver.PhantomJS(
        service_args=service_args,
        desired_capabilities=caps,
        executable_path="./phantomjs-2.1.1-linux-x86_64/bin/phantomjs")

Where proxy's structure is defined as http://username:password@domain:port

I'd hazard a guess that the first auth-parameters aren't passed as a header to the proxy, so you need to do both manually.

0
votes

PhantomJS updated the CLI arguments without updating the documentation. The proxy type has to be included in the proxy address as follows:

service_args = ['--proxy=http://0.0.0.0:0']
driver = webdriver.PhantomJS(service_args=service_args)