Extract URLs from a sitemap
Learn how to extract URLs from a sitemap and then download it to a CSV file with the Extract sitemap utility
With Website Scanner, you can scan specific URLs, a large list of URLs, or even entire websites. Manually adding all URLs for an entire website can be impractical and time-consuming.
You can use have the Extract URLs from sitemap option to extract URLs separately from a sitemap, save it as a CSV file, and upload it for a website scan. If you already have the list of URLs in a CSV file, you can directly upload the file.
Website scanner already has the in-built feature to extract URLs from a sitemap. You can select the Add via sitemap option to add your sitemap directly in the website scan creation process. This is the most convenient method.
How does Extract sitemap work?
You can access the Extract sitemap utility when setting up a website scan. It opens in a new tab within Accessibility Testing. To use the utility, just provide a valid URL, and then click a button to extract the URLs.
At present, the Extract sitemap utility extracts up to 10,000 URLs in a sitemap. If your sitemap contains more URLs, those are ignored.
What is a valid URL?
Any one of the following is a valid URL:
- XML sitemap URL (
example.com/sitemap.xml
) - Domain URL (
example.com
) - Subdomain URL (
subdomain.example.com
) - File path (
example.com/subfolder
) - Any page in the domain or subdomain (
example.com/page.html
)
When you provide the sitemap URL, the utility directly extracts all URLs in the sitemap.
When you provide any of the other valid URLs, the utility:
- Identifies the domain or the subdomain URL.
- For the given domain or subdomain, locates the
robots.txt
file. - Uses the sitemap details in
robots.txt
to navigate to the XML sitemap URL. - Extracts all URLs in the sitemap.
Steps to extract URLs
Follow these steps to extract URLs from a sitemap using Extract sitemap:
In the Set up a website scan window, click Extract URLs from sitemap.
The Extract sitemap utility opens in a new browser tab.
Enter the domain URL or the sitemap URL, and then click Fetch sitemap.
All the pages in your website are displayed in a hierarchical, tree-like format under Filter pages by path name. Next to it, all the pages are shown under n of N pages selected. By default, all the pages are selected.
To exclude some pages or subfolders from the scan, deselect them.
The pages selected count goes down.
When ready, click Download List.
The URLs you selected are downloaded to a CSV file.
Go to the tab that has the Set up a website scan window open.
Upload the CSV file you downloaded.
Related topic
We're sorry to hear that. Please share your feedback so we can do better
Contact our Support team for immediate help while we work on improving our docs.
We're continuously improving our docs. We'd love to know what you liked
We're sorry to hear that. Please share your feedback so we can do better
Contact our Support team for immediate help while we work on improving our docs.
We're continuously improving our docs. We'd love to know what you liked
Thank you for your valuable feedback!