How to perform Web Scraping using Selenium C#
By Arjun M. Shrivastva, Community Contributor - June 14, 2023
The new differentiator is data. It serves as the foundation for market research and company strategy. If you want to begin a new project or build a new strategy for an existing firm, you must continually access and assess a large amount of data. In this case, web scraping can be useful. But first, let’s quickly learn a little about web scraping before we dive in.
What is Web Scraping and its uses?
The technique of automatically gathering data from websites is referred to as web scraping, also known as web harvesting or web data extraction. To extract specific information or interesting data points, it first involves extracting the HTML code of web sites. You may gather structured data from numerous websites via web scraping, which can subsequently be applied to a variety of tasks.
Web scraping is advantageous for a number of reasons:
- Data Gathering: Web scraping makes it possible to swiftly collect a lot of data from numerous websites. Market research, competitor evaluation, sentiment analysis, price comparison, and trend monitoring are just a few uses for this data.
- Automation: Web scraping automates the process, saving time and effort compared to manually copying and pasting data from websites. You can use it to automatically retrieve data whenever you need it or on a regular basis.
- Data Integration: Web scraping makes it easier to integrate data from various websites into a single database or application. You can integrate information from numerous sources and analyse it to get insights and make wise judgements.
- Real-time Data: Web scraping enables you to obtain current information from websites. This is especially helpful for keeping track of stock prices, news updates, weather predictions, social media trends, and other information that must be current.
- Research and Analysis: Web scraping is a common method used by researchers and analysts to collect data for scholarly, scientific, or market research projects. They can use it to analyse big databases, spot trends, and make judgements based on the facts gathered.
- Data Aggregation and Comparison: Web scraping gives you the ability to combine and contrast data from many websites or online platforms. To discover the best pricing, for instance, you can scrape product information and prices from various e-commerce websites.
- Monitoring and Tracking: Using web scraping, you can keep tabs on how websites change over time. To stay informed and respond appropriately, you may keep an eye on price adjustments, product availability, content updates, and other changes.
Web scraping can be a powerful tool, but it should only be used in an ethical and responsible manner. When scraping data, make sure you always abide by the terms of service of the website, respect privacy policies, and adhere to all legal and moral standards.
Role of Selenium in Web Scraping
Selenium is primarily a web automation tool used to imitate user interactions by automating web browsers. It can also be utilised for web scraping, though. Selenium web scraping is the practise of automating browser actions while utilising Selenium WebDriver to collect data from websites.
Here is how to use Selenium for web scraping:
- Automating Browser Actions: Using Selenium WebDriver, you can automate browser tasks including scrolling, browsing between sites, clicking buttons, filling out forms, and interacting with website objects. WebDriver methods can be used programmatically to carry out these operations.
- Data Extraction: Using Selenium, you can find and extract data from particular web page elements. To discover and interact with the desired elements, you can use one of the several finding methods Selenium provides, including XPath, CSS selectors, and element IDs. You can extract the elements’ content, attribute values, or other pertinent information once you’ve found them.
- Managing Dynamic Content: Selenium is very helpful for scraping websites that largely rely on JavaScript or have dynamic material that loads or is modified after the first page load. As a result of Selenium’s ability to interface directly with the browser, it is able to wait for AJAX requests, take actions on dynamically loaded items, and fetch the updated of the content.
- Taking Screenshots: Selenium enables you to take screenshots of websites, which is helpful for preserving or visually checking the data that has been scraped.
- Handling Authentication: Selenium can automate the login process by filling out login forms, sending credentials, and managing cookies and sessions whenever the website asks for authentication or login.
- Scraping JavaScript-rendered Pages: Selenium is capable of scraping websites that are built with JavaScript frameworks like Angular, React, or Vue.js. Since Selenium manages a real browser, it has the ability to run JavaScript and then receive the complete rendered HTML.
As opposed to other scraping techniques, Selenium web scraping necessitates the launch and control of a web browser, which could result in a slight performance and resource burden.
How to perform Web Scraping using Selenium C#
Prerequisites
Prior to web scraping being implemented in Selenium C#. The few prerequisites that we will need are as follows:
- Visual Studio IDE: From their official website, you can download it.
- Selenium Webdriver: An application programming interface for Selenium is called Webdriver. It provides us with the means to instruct Selenium to carry out certain tasks.
- C# Packages: Using the Selenium WebDriver and NUnit framework, we demonstrate Selenium web scraping. The following libraries (or packages) are necessary for the NUnit project:
- Selenium WebDriver
- NUnit
- NUnit3TestAdapter
- Microsoft.NET.Test.SDK
These are the common packages used with NUnit and Selenium for automated browser testing.
Setting up the Selenium C# Project
Follow the steps given below to set up Selenium C# before you begin Web Scraping using Selenium C#.
Read More: How to set up Selenium on Visual Studio
Step 1: To create a project on Visual Studio, follow the below process:
- Open Visual Studio and click on Create a new project option.
- On clicking a window will appear on the screen, where we will select Console App (.NET Framework) as a project template. After that, click on the Next button as we can see in below screenshot-
- Once we clicked on the next button, Configure your new project window will appear on the screen, where we will provide our Project name [webscraping], and click on next button.
- Now we get the new window Additional Information on which we can select the target framework [.NET 6.0]. As shown in the screenshot below, clicking the Create button:
- Once the project is successfully created, and you will get a Program.cs file automatically.
Step 2: Once you have created the project, install the packages mentioned above using the Package Manager (PM) console, which can be accessed through Tools >> NuGet Package Manager >> Package Manager Console.
Step 3: Run the following commands in the PM console, for installing the below packages
- Selenium WebDriver
Install-Package Selenium.WebDrive
- NUnit
Install-Package NUnit
- NUnit3TestAdapter
Install-Package NUnit3TestAdapter
- Microsoft.NET.Test.Sdk
Install-Package Microsoft.NET.Test.Sdk
- ChromeDriver to run webscraping test on Google Chrome browser
Install-Package Selenium.WebDriver.ChromeDriver
Step 4: Run the Get-Package command on the PM console to confirm whether the above packages are installed successfully:
Now that the Selenium C# NUnit project’s required components have been installed, we can add a NUnit test scenario to do web scraping.
How perform Web Scraping: Example
In this demonstration, we will scrap all the items name and price from the bstackdemo.com website and will save it in a CSV file. Chrome will be used to run the web scraping test scenario.
To scrape data from an eCommerce site using Selenium in C#, you can follow these steps:
Step 1: Set up the Selenium WebDriver and navigate to the https://bstackdemo.com/ website
using OpenQA.Selenium; using OpenQA.Selenium.Chrome; class Program { static void Main() { // Set up ChromeDriver IWebDriver driver = new ChromeDriver(); // Navigate to the demo website driver.Navigate().GoToUrl("https://bstackdemo.com/"); // Create a list to store the item details List<string[]> items = new List<string[]>(); // Add your scraping logic here // Close the browser driver.Quit(); } }
Step 2: Identify the elements you want to scrape using their HTML structure, attributes, or XPath. For example, if you want to scrape the name and price of products, you can use code like this. And will add these details in the above created list.
// Find elements that contain the product details IReadOnlyCollection<IWebElement> productElements = driver.FindElements(By.CssSelector(By.ClassName("shelf-item")); // Loop through the product elements and extract the desired information foreach (IWebElement productElement in productElements) { // Extract the name and price of the product string name = productElement.FindElement(By.Classname("shelf-item__title")).Text; string price = productElement.FindElement(By.Classname("val")).Text; // Add the item details to the list items.Add(new string[] { name, price }); }
Step 3: In the end, we will save all the extracted data in the csv file.
// Saving extracted data in CSV file string csvFilePath = "\\webscraping\\items.csv"; using (StreamWriter writer = new StreamWriter(csvFilePath)) { // Write the CSV header writer.WriteLine("Name,Price"); // Write the item details foreach (string[] item in items) { writer.WriteLine(string.Join(",", item)); } } }
Note: As our program includes a static void main method, so we must disable the auto-generation of the program file. Add the following element to your test project’s .csproj, inside a <PropertyGroup> element:
<GenerateProgramFile>false</GenerateProgramFile>
Adding screenshot for better understanding:
Code for Web Scraping using Selenium C#: Example
Using the below code you can implement web scraping using C#. Here the code will open bstackdemo.com, extract the name and price of the products and price, and then save it in an excel file.
using OpenQA.Selenium; using OpenQA.Selenium.Chrome; class Program { static void Main() { IWebDriver driver = new ChromeDriver(); driver.Navigate().GoToUrl("https://bstackdemo.com/"); List<string[]> items = new List<string[]>(); IReadOnlyCollection<IWebElement> productElements = driver.FindElements(By.ClassName("shelf-item")); foreach (IWebElement productElement in productElements) { string name = productElement.FindElement(By.ClassName("shelf-item__title")).Text; string price = productElement.FindElement(By.ClassName("val")).Text; items.Add(new string[] { name, price }); string csvFilePath = "\\webscraping\\items.csv"; using (StreamWriter writer = new StreamWriter(csvFilePath)) { writer.WriteLine("Name,Price"); foreach (string[] item in items) { writer.WriteLine(string.Join(",", item)); } } } driver.Quit(); } }
To run the program, press Ctrl+F5, select green Run button from the top menu.
On the execution, you will get the items.csv file in project folder. (Shown as below)
On opening that file, you will see all the item listed along with their price.
Conclusion
In this article, You have learned the fundamentals of web scraping using Selenium C#. It also explored Web Scraping specific elements using locators in C# with Selenium. As you can see, this requires only a few lines of code. Just remember to comply with the website’s terms of service, be mindful of any rate limits or scraping restrictions, and follow ethical scraping practices.
It is recommended to use BrowserStack, it can be beneficial for ensuring that your scraping code works correctly across different real browsers and devices. It allows you to test your scraping scripts on various browser configurations without the need for setting up multiple local environments. This can be helpful in ensuring the compatibility and reliability of your scraping code across different browser platforms.