The sample demonstrates how to crawl website to find out 404 pages in Python.
pip install beautifulsoup4-
Run
404crawler.pywith the target page, the depth of crawling and link filter:For example, if you want to crawl the website
https://www.dynamsoft.comwith the depth of 1, you can run the following command:python 404crawler.py -l https://www.dynamsoft.com -d 1 -f dynamsoft.com
The default depth is
0, which means only the target page will be checked. If the depth is-1, it will crawl all the pages on the website. -
Press
ctrl+cto stop the program.