At the core of Selenium is WebDriver, which is the remote control interface that enables introspection and control of user agents. WebDriver provides a platform and language-neutral wire protocol as a way for out-of-process programs to remotely instruct the behavior of web browsers, hence instruction sets that can be run interchangeably in many browsers.
Selenium WebDriver refers to both the language bindings and the implementations of the individual browser controlling code and commonly referred as WebDriver. WebDriver is an API and protocol that defines a language-neutral interface for controlling the behaviour of web browsers. Each browser is backed by a specific WebDriver implementation, called a driver. The driver is the component responsible for delegating down to the browser, and handles communication to and from Selenium and the browser.
The Parts and Pieces
At its minimum requirement, WebDriver talks to a browser through a driver and the communication is two way:
- WebDriver passes commands to the browser through the driver
- Receives information back via the same route.
ChromeDriver
ChromeDriver is a standalone server that implements the W3C WebDriver standard. ChromeDriver is available for Chrome on Android and Chrome on Desktop (Mac, Linux, Windows and ChromeOS). The driver runs on the same system as the browser. This may, or may not be, the same system where the tests themselves are executing and is the example of direct communication.
Remote WebDriver
However, Communication to the browser may also be remote communication through Selenium Server or RemoteWebDriver. RemoteWebDriver runs on the same system as the driver and the browser.
Selenium Grid
Remote communication can also take place using Selenium Server or Selenium Grid, both of which in turn talk to the driver on the host system.
Communication through commands
The WebDriver protocol is organised into commands. Each HTTP request with a method and template defined in the specification represents a single command and hence each command produces a single HTTP response. In response to a command, the remote end will run a series of actions known as remote end steps. These provide the sequences of actions that a remote end takes when it receives a particular command.
Command Processing
The remote end is an HTTP server reading requests from the client and writing responses typically over a TCP socket. In the specification the communication is modeled as the data transmission between a particular local end and remote end with a connection to which the remote end may write bytes and read bytes. The exact details of how this connection works and how it is established is a bigger topic and out of scope for this question. After a connection has been established, the remote end must read bytes from the connection until a complete HTTP request can be constructed from the data. If it is not possible to construct a complete HTTP request, the remote end must either close the connection, return an HTTP response with status code 500, or return an error with error code unknown error.