Verify the status of each url in a (hosted) sitemap XML file, by crawling through the XML and fetching it to see if returns a 200 OK. Free alternative to Screaming Frog SEO Spider's paid sitemap crawler feature.
Kapture.2024-05-23.at.08.34.32.mp4
-
Clone the repository
git clone https://github.com/dylancl/sitemap-scraper.git
-
Install the dependencies
pnpm install
-
Run the script
pnpm start
- Enter the URL of the sitemap XML file you want to check.
- The script will ask you for configuration options:
- Concurrency limit: The maximum number of requests that can be made at the same time. Default is 5. Must be a number between 1 and 15.
- Request delay: The delay between each request. Default is 1000. Must be a number starting from 250.
- Traversal order: The order in which the URLs will be checked. Default is
sequential
. Options aresequential
andrandom
.
- The script will start checking the URLs and display the progress in the console.
- When the script is done, it will ask you if you want to save the results (ok & not ok URLs) to a file.