Module 01 (Python):
Class 01:
● Topics: Basic Syntax, Variables, Data Types, Operators,
Lists, Tuples, Sets, Dictionary.
Class 02:
● Topics: Conditional Statements (if-else), Loops,
Try-Except.
Class 03:
● Topics: Reading/Writing Files, Functions, Lambda
Functions, Working with Dates.
Module 02 (Web Scraping):
Class 04:
● Topics: Introduction to Web Scraping, Fundamentals,
How Web Scraping Works.
Class 05:
● Topics: Extracting Data from APIs, Storing Data into
CSV Files.
● Project: Hands-on Project 01 - eCommerce Website Data
Scraping with Front-End.
Class 06:
● Topics: Finding XPATH, Browser Automation with
Selenium.
Class 07:
● Project: Hands-on Project 02 - Scraping Data and
Images from a Website using Selenium.
Class 08:
● Topics: BeautifulSoup Crash Course.
● Project: Hands-on Project 03 - Scraping Data from a
News Website.
Class 09:
● Topics: Scrapy Fundamentals.
● Project: Hands-on Project 04 - Scraping Data from a
Sports Website.
Class 10:
● Project: Hands-on Project 05 - Scraping Data from an
Android App using Appium.
Class 11:
● Topics: Creating Web Scraper Scripts using Multiple
Libraries.
Class 12:
● Topics: Pandas Crash Course for Data Cleaning.
Module 03 (Docker & PostgreSQL):
Class 13 (Docker 01):
● Topics: Introduction to Docker, Installing Docker, Basic
Commands, Dockerizing a Python Web Scraping Script.
Class 14 (Docker 02):
● Topics: Working with Dockerfiles, Creating and
Managing Containers, Volumes, Docker Networks.
Class 15 (Docker 03):
● Topics: Docker Compose, Multi-Container Applications
(Integrating PostgreSQL and Python Scraper).
Class 16 (PostgreSQL 01):
● Topics: Introduction to PostgreSQL, Installing
PostgreSQL, Basic SQL Queries, Data Types.
Class 17 (PostgreSQL 02):
● Topics: Creating and Managing Databases, Tables,
CRUD Operations with Python (Using Psycopg2).
Class 18 (PostgreSQL 03):
● Topics: Advanced SQL Queries, Indexing, Joins, Using
PostgreSQL with Pandas for Data Analysis.
Class 19 (Integration):
● Topics: Building a Full Web Scraper Pipeline: Scraping
Data, Storing in PostgreSQL using Dockerized
Environment.
Module 04 (Airflow & Data Pipelines):
Class 20 (Airflow 01):
● Topics: Introduction to Apache Airflow, Installing
Airflow, Core Concepts (DAGs, Operators, Tasks, and
Workflows).
Class 21 (Airflow 02):
● Topics: Creating DAGs for Scheduling Python Scripts
(Web Scrapers), Task Dependencies, Parallelism.
Class 22 (Airflow 03):
● Topics: Using Airflow with Docker for Containerized
Pipelines, Trigger Rules, Task Execution, Managing
Failures.
Class 23 (Airflow 04):
● Topics: Building and Scheduling Full ETL Pipelines
using Airflow: Scraping, Data Processing, Loading into
PostgreSQL.
Class 24 (Project):
● Project: Create a Full Data Pipeline using Airflow to
Scrape Data, Clean, and Load into PostgreSQL.
Module 05 (Kafka & Real-Time Data Streaming):
Class 25 (Kafka 01):
● Topics: Introduction to Apache Kafka, Installing Kafka,
Core Concepts (Producers, Consumers, Topics,
Partitions).
Class 26 (Kafka 02):
● Topics: Creating Kafka Producers and Consumers with
Python, Setting up Kafka Clusters, Basic Operations.
Class 27 (Kafka 03):
● Topics: Using Kafka for Real-Time Data Streaming from
Web Scrapers, Handling Large Volumes of Scraped
Data.
Class 28 (Kafka 04):
● Topics: Kafka Streams for Data Processing, Integrating
Kafka with PostgreSQL for Real-Time Ingestion.
Class 29 (Kafka 05):
● Topics: Kafka with Docker: Running Kafka in
Containers, Scaling Kafka with Multiple Brokers.
Class 30 (Project):
● Project: Build a Real-Time Data Streaming Pipeline
using Kafka, Docker, and PostgreSQL for Streaming
Web Scraped Data.
Final Project (Capstone):
Class 31 (Capstone Project):
● Topics: Integrating All Components (Web Scraping,
Docker, PostgreSQL, Airflow, Kafka) into a Full Data
Engineering Pipeline.
● Project: Build a Complete End-to-End Data Pipeline:
○ Scrape data using Selenium/BeautifulSoup.
○ Use Kafka for real-time streaming.
○ Store data in PostgreSQL.
○ Schedule and orchestrate tasks using Airflow.
○ Dockerize the entire pipeline.