Web Crawling Software Setup SOP
Objective
This SOP outlines the steps to install and set up the web crawling software
for extracting emails from specified websites.
Key Steps
1. Download the Web Crawler Zip File 0:15
Download the zip file to your computer.
2. Install Python 0:44
Go to Google and search for 'download Python'.
Click on the yellow button to download Python 3.13
Run the installer and click 'Next' through the installation prompts.
Important: Ensure to check the box that says 'Add to PATH' during
installation.
3. Extract the Web Crawler Files 2:04
Navigate to your Downloads folder.
Right-click on the web crawler zip file and select 'Extract All'.
Click 'Extract' to create a folder with the extracted files.
4. Open Command Prompt 3:11
In the extracted folder, click on the address bar and copy the address.
Then, go on search option at the bottom and type 'cmd' to open
Command Prompt.
5. Navigate to the Web Crawler Directory 3:23
In Command Prompt, type 'cd ' followed by the path of the extracted
folder (paste it) and press Enter.
6. Install Required Modules 3:44
In Command Prompt, type 'pip install -r requirements.txt' and press
Enter.
Wait for the installation of modules to complete.
7. Verify Python Installation 4:44
Type 'python --version' in Command Prompt to check if Python is
installed correctly.
Ensure it shows a valid version number.
8. Run the Web Crawler 5:21
In Command Prompt, type 'python main.py '. click enter. Then type the
websites you want to crawl, separated by commas.
Press Enter to start the crawling process.
9. Access the Results 6:21
After the crawling is finished, locate the generated Excel file in the
same folder as the web crawler.
Open the Excel file to view the crawled websites and corresponding
emails.
Tips for Efficiency
Keep your web crawler files organized in a dedicated folder for easy
access.
Regularly update Python and the required modules to avoid
compatibility issues.
Link to Loom
https://loom.com/share/16f7f6a58c25422eb1d034bc003b96f7
Important Points to Note:
1. You can always visit the web crawler folder to get access of the python
files.
2. pip install -r requirements.txt is only a one time task. For the next time,
you can directly run python main.py.
3. Please make sure always that in the command prompt you have
changed the original path to the path of the folder you are in.
HAPPY CRAWLING!!