How to find broken links in Selenium
By Arjun M. Shrivastva & Shreya Bose, Community Contributor - February 2, 2023
Before discussing how to find broken links using Selenium WebDriver, let’s address a more fundamental question.
What are Broken Links?
To start with, a link is an HTML object that enables users to migrate from one web page to another when they click on it. It is a means to navigate between different web pages on the internet.
A broken link, also often called a dead link, is one that does not work i.e. does not redirect to the webpage it is meant to. This usually occurs because the website or particular web page is down or does not exist. When someone clicks on a broken link, an error message is displayed.
Broken links may exist due to some kind of server error, which, in turn, causes the corresponding page to malfunction and not be displayed. A valid URL will have a 2xx HTTP status code. Broken links, which are essentially invalid HTTP requests have 4xx and 5xx status codes.
The 4xx status code refers to a client-side error, while the 5xx status code usually points to a server response error.
Read More: What is Browser Automation?
HTTP Status Codes for Broken Links
HTTP Status Code | Definition |
---|---|
400 (Bad Request) | Server unable to process request as URL is incorrect |
400 (Bad Request – Bad Host) | Server unable to process request as host name is invalid |
400 (Bad Request – Bad URL): | Server cannot process request as the URL is of incorrect format; missing characters like brackets, slashes, etc. |
400 (Bad Request – Empty) | Response returned by the server is empty with no content & no response code |
400 (Bad Request – Timeout) | HTTP requests have timed out |
400 (Bad Request – Reset) | Server is unable to process the request, as it is busy processing other requests or has been misconfigured by site owner |
404 (Page Not Found) | Page is not available on the server |
403 (Forbidden) | Server refuses to fulfill the request as authorization is required |
410 (Gone) | Page is gone. This code is more permanent than 404 |
408 (Request Time Out) | Server has timed-out waiting for the request. |
503 (Service Unavailable) | Server is temporarily overloaded and cannot process the request |
Why check for Broken Links in Selenium?
If a user clicks on a broken link, they will be directed to an error page. This obviously contributes to sub-par user experience. Broken links defeat the purpose of having the website in the first place because users cannot find the information or service they are looking for.
Every link on a website must be tested to ensure that it is functioning as expected. However, given that most websites have hundreds (sometimes thousands) of links required to make them work, manual testing of each link would require excessive amounts of time, effort, and resources. Moreover, with automated Selenium testing being an option, it would be completely unnecessary.
Read More: How to take Screenshots in Selenium
Common Reasons for Broken Links
- 404 Page Not Found – The destination web page has been removed by the owner
- 400 Bad Request – The server cannot process the HTTP request triggered by the link because the URL address requested is wrong
- Due to the user’s firewall settings, the browser cannot access the destination web page
- The link is misspelled
How to identify broken links in Selenium WebDriver
To check broken links in Selenium, the process is simple. On a web page, hyperlinks are implemented using the HTML Anchor (<a>) tag. All the script needs to do is to locate every anchor tag on a web page, get the corresponding URLs, and run through the links to check if any of them are broken.
Use the following steps to identify broken links in Selenium
- Use <a> tag to fetch all the links present on a web page
- Send HTTP request for the link
- Verify the HTTP response code for the link
- Determine if the link is valid or it is broken based on the HTTP response code
- Repeat the process for all links captured with the first step
If you’re wondering how to find broken images using Selenium WebDriver, use the same process.
Read More: How to perform Double Click in Selenium
Finding Broken Links in Selenium: Example
package BstackDemo; import org.openqa.selenium.By; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.openqa.selenium.chrome.ChromeDriver; import java.net.HttpURLConnection; import java.net.URL; import java.util.List; public class BrokenLinks { public static void main(String[] args) { System.setProperty("webdriver.chrome.driver", "D:/chromedriver.exe"); WebDriver driver = new ChromeDriver(); // Navigate to BStackDemo Website driver.get("https://bstackdemo.com/"); // Finding all the available links on webpage List<WebElement> links = driver.findElements(By.tagName("a")); // Iterating each link and checking the response status for (WebElement link : links) { String url = link.getAttribute("href"); verifyLink(url); } driver.quit(); } public static void verifyLink(String url) { try { URL link = new URL(url); HttpURLConnection httpURLConnection = (HttpURLConnection) link.openConnection(); httpURLConnection.setConnectTimeout(3000); // Set connection timeout to 3 seconds httpURLConnection.connect(); if (httpURLConnection.getResponseCode() == 200) { System.out.println(url + " - " + httpURLConnection.getResponseMessage()); } else { System.out.println(url + " - " + httpURLConnection.getResponseMessage() + " - " + "is a broken link"); } } catch (Exception e) { System.out.println(url + " - " + "is a broken link"); } } }
Run Selenium Tests on Real Devices for Free
In this example, we navigate to a web page, collect all the links using driver.findElements(By.tagName(“a”)), and then iterate through each link. We used the HttpURLConnection class to check the response status of each link’s URL. If the response code is 200, the link is considered valid; otherwise, it’s marked as a broken link.
Save the Java file and run the test. The program will output the status of each link, indicating whether it’s a broken link or not. As shown below
This example provides a basic way to find broken links using Selenium and Java. Depending on your project’s requirements, you may want to enhance the method to handle more complex scenarios and edge cases.
Finding broken links in Selenium is an integral part of website development and testing. By using the method described in this article, testers can identify malfunctioning links quickly and correctly. Allowing broken links to pass into the production stage would severely damage the user experience and must be prevented with extreme thoroughness. This is why knowing how to test broken links in Selenium is an important part of a tester’s toolkit.
Follow-up Read: How to handle Multiple Tabs in Selenium
Conclusion
In conclusion, identifying broken links is a crucial aspect of web application testing, and Selenium provides a powerful platform to automate this process efficiently. By leveraging Selenium WebDriver and Java programming capabilities, web testers can easily navigate through web pages, collect links, and validate their response status. The step-by-step guide presented in this article demonstrates a straightforward approach to finding broken links using Selenium and Java. Also, BrowserStack plays a significant role in finding broken links in Selenium Java testing by providing a cloud-based platform for testing web applications across a wide range of browsers, operating systems, and devices.