Overview
Selenium is a portable software-testing framework for web applications. Selenium WebDriver is the successor to Selenium RC: it accepts commands and sends them to a browser. This is implemented through a browser-specific browser driver, which sends commands to a browser, and retrieves results. Most browser drivers actually launch and access a browser application (such as Firefox, Chrome, Internet Explorer, Safari, or Microsoft Edge); there is also an HtmlUnit browser driver, which simulates a browser using the headless browser HtmlUnit.
In this post, I’ll use Selenium WebDriver 3.8 in Mac OS with Firefox 58. After reading this post, you’ll understand:
- How to install GeckoDriver (for Firefox)
- How to initialize WebDriver in Java
- How to select WebElement
- How to execute native JS command
- How to send keys to element
- How to wait
- How to use basic XPath (XML Path Language)
- Troubleshooting
Installation
GeckoDriver is a proxy for using W3C WebDriver-compatible clients to interact with Gecko-based browsers. GeckoDriver provides HTTP API described by the WebDriver protocol to communicate with Gecko browsers, such as Firefox version above 47.
Install GeckoDriver via brew, then check the version.
$ brew install geckodriver
$ geckodriver --version
geckodriver 0.20.0
Initialize WebDriver
WebElement Selection
Create a page for storing all the information related to a page, equivalent to a HTML document object, but in Java.
Once you’ve created such page, you can retrieve web element in different ways: by class name, by CSS selector, by ID, by link text, by partial link text, by name, by tag, and by xpath. Here’re some examples for querying the following HTML content.
Let’s take a look:
Execute Native JavaScript Command
You might want to execute native JavaScript code in Java via WebDriver. For example, scrolling the document so that the target element in on the top of the viewport. You can achieve it by doing:
This can be simplified if you’re using a remote web driver. No cast is required:
Send Keys to Element
You can send keys to input HTML elements, e.g. <input> and <textarea>.
Wait WebElement
Use FluentWait
to wait a web element, until a predicate is satisfied. The
generic type <F>
is the input type for each condition used with this instance.
XPath
Here’s a list of XPath that I used frequently.
Expression | Description |
---|---|
//*[@id='foo'] |
Select any tag having id “foo”. |
//a[text()='foo'] |
Select tag <a> having text “foo”. |
//a[contains(@class, 'red')] |
Select tag <a> having “red” in its attribute class. |
//a[contains(text(), 'foo')] |
Select tag <a> having “foo” in its text. |
You can test the xpath expression in your browsers. First, open the console via shortcut:
- ⌘ + ⌥ + C for Firefox
- ⌘ + ⇧ + C for Chrome
Then write the xpath expression. If the browser returns a non-empty results, then the xpath works:
Trouble Shooting
Some points that need to be careful.
Scrolling
If you need to scroll the document before clicking an element, do not scroll the element directly, scroll its container:
Method Element.scrollIntoView() scrolls the element on which it’s called into the visible area of the browser window. If set to true, the element will be scrolled and be aligned to the top of the viewport.
Other Points
Question/answer available on StackOverflow: