Using Selenium Wire Proxy in Python
By Arjun M. Shrivastva, Community Contributor - September 30, 2024
Capturing network traffic is crucial in test automation and web scraping. However, it is beyond the scope of standard Selenium. Selenium Wire extends Selenium’s functionality, allowing users to monitor and manipulate network requests during automated tests.
This guide explores Selenium Wire in Python to enhance browser automation and gain insights into web traffic.
- What is Selenium Wire?
- Benefits of using Selenium wire proxy in Python
- How to configure Selenium Wire in Python?
- Step 1. Install the Required Libraries
- Step 2. Set Up WebDriver
- Step 3. Capturing Network Requests
What is Selenium Wire?
Selenium Wire is an extension of the standard Selenium library that allows you to capture and manipulate network requests during browser automation. It provides features like
- logging HTTP requests
- modifying headers
- intercepting responses
which can be invaluable for debugging, performance testing, and scraping data from dynamic websites.
With its seamless integration into Selenium, it enhances your ability to inspect traffic and gain control over the browser’s network activity, all from within your Python code.
Benefits of using Selenium wire proxy in Python
Using Selenium Wire in Python offers several advantages that go beyond basic browser automation:
- Network Traffic Monitoring: Selenium Wire captures all HTTP/HTTPS requests and responses, making it easy to inspect and log network activity in real-time.
- Request and Response Modification: You can intercept requests and responses, modify headers, or inject custom data to simulate various scenarios, such as different user agents or cookies.
- Support for Proxies: Selenium Wire allows you to route browser traffic through a proxy, which is useful for web scraping, bypassing geo-blocking, or testing under different network conditions.
- Enhanced Debugging: By providing insights into network calls, Selenium Wire helps debug API responses, troubleshoot slow-loading elements, or identify errors in requests.
- Block Requests: Easily block specific resources (e.g., ads, large media files) to speed up scraping or test how a site behaves without them.
These benefits make Selenium Wire an indispensable tool for developers and testers who need deeper control over network interactions during browser automation.
How to configure Selenium Wire in Python?
Getting started with Selenium Wire is straightforward. Follow these steps to set up the environment and integrate it with your Python project:
Step 1. Install the Required Libraries
First, install Selenium and Selenium Wire using pip:
pip install selenium pip install selenium-wire
Step 2. Set Up WebDriver
After installation, you can configure Selenium Wire with your WebDriver, just like with regular Selenium:
from seleniumwire import webdriver # Set up Chrome WebDriver with Selenium Wire driver = webdriver.Chrome() # Navigate to the demo website driver.get('https://bstackdemo.com/')
Step 3. Capturing Network Requests
To capture network requests, simply access the requests attribute from Selenium Wire:
for request in driver.requests: print(request.url, request.method) driver.quit()
(Note: You may encounter the error “ModuleNotFoundError: No module named blinker._saferef”. This can be resolved by downgrading the blinker library to version 1.7.0)
Now, the environment is ready, and you can use Selenium Wire to enhance your web automation projects in Python.
How to perform Proxy Authentication with Python Selenium?
Selenium Wire simplifies the process of using proxies with authentication. When configuring the WebDriver, you can pass the proxy details, including username and password, in the seleniumwire_options.
Here’s an example of how to use Selenium Wire to handle proxy authentication:
from seleniumwire import webdriver # Proxy server credentials and host information proxy_username = 'your-username' proxy_password = 'your_password' proxy = "127.0.0.1:8081" sw_options = { 'proxy': { 'http': f'http://{proxy_username}:{proxy_password}@{proxy}', 'https': f'https://{proxy_username}:{proxy_password}@{proxy}', } } # Initialize WebDriver with proxy settings driver = webdriver.Chrome(seleniumwire_options=sw_options) # Access target website driver.get('https://bstackdemo.com/') # Close the browser driver.quit()
Code Breakdown:
- Proxy Settings: Configures the proxy (127.0.0.1:8081) with basic authentication (your-username:your_password).
- WebDriver Initialization: Starts Chrome WebDriver, routing all traffic through the proxy.
- Page Access: Opens https://bstackdemo.com/ using the proxy.
- Clean Exit: Closes the browser session.
Read More: How to set proxy in Selenium
How to use Selenium Wire with your Selenium Scrapers or Bots?
Selenium Wire enhances Selenium’s capabilities by allowing you to capture network traffic, use proxies, and modify requests. This is particularly useful for web scrapers or bots that need to bypass restrictions or inspect requests and responses.
Here’s an example that scrapes product names and prices from https://bstackdemo.com/ using a proxy:
from seleniumwire import webdriver from selenium.webdriver.common.by import By import json # Set up proxy options options = {'proxy': {'https': 'http://127.0.0.1:8080'}} # Initialize WebDriver driver = webdriver.Chrome(seleniumwire_options=options) driver.get('https://bstackdemo.com/') # Scrape product names and prices products = driver.find_elements(By.CLASS_NAME, "shelf-item") data = {product.find_element(By.CLASS_NAME, 'shelf-item__title').text: product.find_element(By.CLASS_NAME, 'val').text for product in products} # Print data in JSON format print(json.dumps(data, ensure_ascii=False)) driver.quit()
Code Breakdown
- Proxy Setup: A proxy is configured using the seleniumwire_options parameter.
- Product Scraping: The scraper locates all products using the class name “shelf-item.” For each product, the name is extracted from the “shelf-item__title” element, and the price from the “val” element. All data get stored in a dictionary.
- Data Output: The results are printed as a JSON string.
- Session Closure: The browser session is closed using driver.quit().
This example demonstrates how to integrate Selenium Wire with proxies into your web scrapers.
Using Selenium Wire to capture HTTP requests
Selenium Wire allows you to capture HTTP requests and responses, giving you access to the raw network traffic while interacting with a website. This can be useful for inspecting APIs, headers, or any requests sent during a browser session.
Here’s an example to demonstrate how to capture and display HTTP requests while browsing a website:
from seleniumwire import webdriver # Initialize WebDriver driver = webdriver.Chrome() # Navigate to a website driver.get('https://bstackdemo.com/') # Capture and display HTTP requests for request in driver.requests: if request.response: print( request.url, request.response.status_code, request.response.headers['Content-Type'] ) driver.quit()
Code Breakdown
- WebDriver Initialization: The Chrome WebDriver is started without any proxy settings, as the focus here is capturing HTTP requests.
- Page Loading: The browser navigates to the website https://bstackdemo.com/.
- Capturing Requests: driver.requests stores all HTTP requests sent during the session. For each request, check if there’s a response and print the URL, status code, and header’s Content-Type.
- Session Termination: The browser is closed using driver.quit() to end the session after capturing the requests.
This method gives you complete visibility of all network interactions and is useful for debugging and analyzing web traffic.
How to modify requests and responses using Selenium Wire?
Selenium Wire is a powerful library that extends Selenium’s capabilities by allowing you to intercept, modify, and manipulate HTTP requests and responses directly within your Python code.
This feature is useful when you need to test how a web application behaves under various scenarios, such as simulating different network conditions, manipulating request headers, or altering server responses.
Method 1: Modifying Requests
Selenium Wire allows you to intercept and modify outgoing HTTP requests. For example, you can modify headers, change request bodies, or even simulate network issues.
In this example, you need to modify the outgoing request by adding custom headers. Specifically, you must update the “User-Agent” and “Cookie” headers in the request.
from seleniumwire import webdriver import json # Initialize the Chrome WebDriver driver = webdriver.Chrome() # Define the request interceptor to modify request headers def interceptor(request, response): # Modify the 'Cookie' header in the request response.headers['Cookie'] = 'key1=value1;key2=value2;' # Remove the existing 'User-Agent' header and set a new one del response.headers['User-Agent'] response.headers['User-Agent'] = 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/113.0' # Assign the interceptor to the driver driver.response_interceptor = interceptor # Navigate to a URL driver.get("https://bstackdemo.com/") # Loop through all requests and print the headers of the response for request in driver.requests: if request.url == "https://bstackdemo.com/": print(json.dumps(dict(request.response.headers), indent=2)) # Close the driver driver.quit()
Code Breakdown
- Initializing the Driver: Initialize the webdriver.Chrome() to create a Chrome instance using Selenium Wire.
- Defining the Interceptor: The interceptor function is designed to modify outgoing requests. It updates the Cookie header and changes the User-Agent to simulate a request coming from Firefox on Linux.
- Response Interception: driver.response_interceptor assigns the interceptor function to the driver, ensuring that every response is checked and modified before it reaches the browser.
- Fetching and Printing Headers: After sending the request to https://bstackdemo.com/, the modified response headers are captured and printed.
Method 2: Modifying Responses
In addition to requests, Selenium Wire can intercept and modify incoming responses. You can change the status code, alter response bodies, or manipulate headers before they are processed by the browser.
In the below example, simulate a custom response by intercepting a request to https://bstackdemo.com/ and sending a “Hello from BrowserStack!” message instead of the actual server response.
import time from seleniumwire import webdriver # Initialize the Chrome WebDriver driver = webdriver.Chrome() # Define the request interceptor to create a custom response def interceptor(request): if request.url == 'https://bstackdemo.com/': request.create_response( status_code=200, # Setting a custom status code headers={'Content-Type': 'text/html'}, # Custom headers body='Hello from BrowserStack!' # Custom response body ) # Assign the request interceptor driver.request_interceptor = interceptor # Navigate to the target website driver.get('https://bstackdemo.com/') # Wait to observe the behavior time.sleep(5) # Close the driver driver.quit()
Code Breakdown:
- Creating a Custom Response: In the interceptor function, check if the request URL matches https://bstackdemo.com/. If it does, the function creates a custom response using request.create_response(). This custom response returns a status code of 200 (OK), sets the Content-Type header to text/html, and returns a body containing the string “Hello from BrowserStack!”.
- Assigning the Interceptor: The driver.request_interceptor assigns the interceptor to intercept outgoing requests and potentially modify their responses.
- Custom Response Observation: When the browser navigates to the website, it receives the custom “Hello from BrowserStack!” response instead of the content from the actual server.
How does BrowserStack Automate run tests in Selenium Cloud Grid with real devices?
BrowserStack Automate makes running tests on Selenium Cloud Grid with real devices effortless. It provides you access to a wide variety of actual devices and browsers, so you can see how your app really performs in various user scenarios. Setting it up is simple. You just need to configure your Selenium WebDriver with BrowserStack’s capabilities and remote URL.
One of the standout features is the ability to run tests in parallel across different devices and browsers. This speeds up the testing process and helps you cover more ground quickly. Plus, since BrowserStack takes care of maintaining the devices and updating browsers, you don’t have to worry about the technical details.
On top of that, BrowserStack provides useful reporting tools like logs, videos, and screenshots. These help you spot and fix issues faster, making your testing process more efficient.
For more details on how to set up, configure, and execute Python Selenium tests on BrowserStack Selenium Cloud Grid, refer to this documentation.
Read More: Selenium Cloud Grid Tutorial
Conclusion
In summary, Selenium Wire in Python is a game-changer for handling HTTP requests and responses, giving you extra control and insight into your web scraping and automation tasks. It simplifies setting up proxies, capturing traffic, and modifying requests, making your testing and debugging more efficient.
Combined with BrowserStack Automate, you can run your Selenium tests on real devices, ensuring your app performs well across different environments. This setup helps you catch issues early and maintain high-quality, reliable applications.
Overall, Selenium Wire and BrowserStack together enhance your automation efforts, offering valuable tools to refine and perfect your web projects.