Web Scraping Code Assignments
Code 1: Static News Headlines
Objective: Learn basic HTML parsing and CSS selectors Task: Scrape the latest headlines from
a news website like BBC News or CNN Skills Covered:
• Setting up requests library
• Parsing HTML with BeautifulSoup
• Using CSS selectors
• Handling basic text extraction
Requirements:
• Extract 10 latest headlines
• Save to a text file
• Handle potential encoding issues
• Add timestamps to each headline
Expected Output: Text file with timestamped headlines
Code 2: E-commerce Product Information
Objective: Extract structured data from product listings Task: Scrape product information from
an e-commerce site ([Link] is ideal for practice) Skills Covered:
• Extracting multiple data points per item
• Working with prices and ratings
• Creating structured data output
Requirements:
• Extract: product name, price, rating, availability
• Handle at least 20 products
• Save data to CSV format
• Implement basic error handling
Expected Output: CSV file with product data
Code 3: Weather Data Collection
Objective: Work with tables and periodic data collection Task: Scrape weather forecasts and
historical data Skills Covered:
• Parsing HTML tables
• Handling date/time data
• Data cleaning and formatting
Requirements:
• Extract 7-day weather forecast
• Include temperature, humidity, precipitation
• Convert units if necessary
• Create a summary report
Expected Output: JSON file with weather data and summary statistics
Code 4: Social Media Post Analysis
Objective: Handle dynamic content and rate limiting Task: Scrape public posts from Reddit or
Twitter (using official APIs where required) Skills Covered:
• API integration vs web scraping
• Rate limiting and delays
• Text processing and sentiment analysis
Requirements:
• Collect 100 posts from a specific subreddit/hashtag
• Extract post text, author, timestamp, engagement metrics
• Implement respectful delay between requests
• Basic sentiment classification (positive/negative/neutral)
Expected Output: Database or JSON with posts and sentiment analysis
Code 5: Job Listings Aggregator
Objective: Multi-page scraping and data normalization Task: Create a job listings aggregator
from multiple job sites Skills Covered:
• Handling pagination
• Normalizing data from different sources
• Advanced error handling and retry logic
Requirements:
• Scrape from at least 2 different job sites
• Extract: job title, company, location, salary (if available), description
• Handle pagination for at least 5 pages per site
• Normalize location data and salary formats
• Detect and remove duplicate listings
Expected Output: Unified database of job listings with deduplication
Code 6: Stock Market Data Tracker
Objective: Real-time data collection and visualization Task: Build a stock price monitoring
system Skills Covered:
• Handling JavaScript-rendered content (Selenium)
• Time-series data collection
• Data visualization
• Scheduling and automation
Requirements:
• Track 5-10 stocks over time
• Collect data every 15 minutes during market hours
• Handle dynamic content loading
• Create basic charts showing price trends
• Implement alert system for significant price changes
Expected Output: Time-series database with visualization dashboard
Code 7: Academic Paper Metadata Extractor
Objective: Complex text processing and academic data handling Task: Scrape academic paper
information from arXiv or Google Scholar Skills Covered:
• PDF text extraction
• Handling academic formatting
• Citation parsing
• Advanced text processing
Requirements:
• Extract paper titles, authors, abstracts, publication dates
• Parse citation counts and references
• Handle various document formats
• Create author collaboration networks
• Implement search functionality by keywords
Expected Output: Academic database with search and network analysis features
Code 8: Real Estate Market Analysis
Objective: Geographic data and advanced analytics Task: Create a comprehensive real estate
market analyzer Skills Covered:
• Geographic data handling
• Image processing (property photos)
• Advanced data analysis
• Map integration
Requirements:
• Scrape property listings from real estate sites
• Extract: price, location, size, amenities, photos
• Calculate price per square foot
• Create geographic heat maps
• Analyze market trends by neighborhood
• Handle anti-scraping measures (delays, user agents)
Expected Output: Interactive map-based real estate dashboard
Code 9: Multi-lingual News Sentiment Monitor
Objective: Advanced text processing and language handling Task: Monitor global news
sentiment across multiple languages Skills Covered:
• Multi-language text processing
• Advanced sentiment analysis
• Cross-site data aggregation
• Cultural context consideration
Requirements:
• Scrape news from sites in at least 3 different languages
• Implement language detection
• Perform sentiment analysis per language
• Track sentiment trends over time
• Create comparative analysis between regions
• Handle character encoding issues
Expected Output: Multi-language sentiment dashboard with trend analysis
Code 10: E-learning Course Catalog Analyzer
Objective: Complex data relationships and recommendation systems Task: Build an intelligent
course recommendation system Skills Covered:
• Complex relationship mapping
• Machine learning integration
• Advanced data modeling
• Recommendation algorithms
Requirements:
• Scrape course catalogs from multiple online learning platforms
• Extract: course titles, descriptions, instructors, ratings, prerequisites
• Map course relationships and learning paths
• Implement skill extraction from course descriptions
• Build recommendation engine based on user interests
• Handle dynamic loading and infinite scroll
• Create course comparison features
Expected Output: Intelligent course recommendation platform