0% found this document useful (0 votes)
10 views6 pages

Web Scraping

The document outlines a comprehensive curriculum for a data engineering course, divided into five modules covering Python, Web Scraping, Docker & PostgreSQL, Airflow & Data Pipelines, and Kafka & Real-Time Data Streaming. Each module consists of multiple classes that teach various topics and include hands-on projects to apply the learned skills. The final capstone project integrates all components into a complete end-to-end data pipeline.

Uploaded by

hakkimmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

Web Scraping

The document outlines a comprehensive curriculum for a data engineering course, divided into five modules covering Python, Web Scraping, Docker & PostgreSQL, Airflow & Data Pipelines, and Kafka & Real-Time Data Streaming. Each module consists of multiple classes that teach various topics and include hands-on projects to apply the learned skills. The final capstone project integrates all components into a complete end-to-end data pipeline.

Uploaded by

hakkimmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Module 01 (Python):

Class 01:

● Topics: Basic Syntax, Variables, Data Types, Operators,


Lists, Tuples, Sets, Dictionary.

Class 02:

● Topics: Conditional Statements (if-else), Loops,


Try-Except.

Class 03:

● Topics: Reading/Writing Files, Functions, Lambda


Functions, Working with Dates.

Module 02 (Web Scraping):

Class 04:

● Topics: Introduction to Web Scraping, Fundamentals,


How Web Scraping Works.

Class 05:

● Topics: Extracting Data from APIs, Storing Data into


CSV Files.
● Project: Hands-on Project 01 - eCommerce Website Data
Scraping with Front-End.

Class 06:
● Topics: Finding XPATH, Browser Automation with
Selenium.

Class 07:

● Project: Hands-on Project 02 - Scraping Data and


Images from a Website using Selenium.

Class 08:

● Topics: BeautifulSoup Crash Course.


● Project: Hands-on Project 03 - Scraping Data from a
News Website.

Class 09:

● Topics: Scrapy Fundamentals.


● Project: Hands-on Project 04 - Scraping Data from a
Sports Website.

Class 10:

● Project: Hands-on Project 05 - Scraping Data from an


Android App using Appium.

Class 11:

● Topics: Creating Web Scraper Scripts using Multiple


Libraries.

Class 12:

● Topics: Pandas Crash Course for Data Cleaning.


Module 03 (Docker & PostgreSQL):

Class 13 (Docker 01):

● Topics: Introduction to Docker, Installing Docker, Basic


Commands, Dockerizing a Python Web Scraping Script.

Class 14 (Docker 02):

● Topics: Working with Dockerfiles, Creating and


Managing Containers, Volumes, Docker Networks.

Class 15 (Docker 03):

● Topics: Docker Compose, Multi-Container Applications


(Integrating PostgreSQL and Python Scraper).

Class 16 (PostgreSQL 01):

● Topics: Introduction to PostgreSQL, Installing


PostgreSQL, Basic SQL Queries, Data Types.

Class 17 (PostgreSQL 02):

● Topics: Creating and Managing Databases, Tables,


CRUD Operations with Python (Using Psycopg2).

Class 18 (PostgreSQL 03):

● Topics: Advanced SQL Queries, Indexing, Joins, Using


PostgreSQL with Pandas for Data Analysis.

Class 19 (Integration):

● Topics: Building a Full Web Scraper Pipeline: Scraping


Data, Storing in PostgreSQL using Dockerized
Environment.
Module 04 (Airflow & Data Pipelines):

Class 20 (Airflow 01):

● Topics: Introduction to Apache Airflow, Installing


Airflow, Core Concepts (DAGs, Operators, Tasks, and
Workflows).

Class 21 (Airflow 02):

● Topics: Creating DAGs for Scheduling Python Scripts


(Web Scrapers), Task Dependencies, Parallelism.

Class 22 (Airflow 03):

● Topics: Using Airflow with Docker for Containerized


Pipelines, Trigger Rules, Task Execution, Managing
Failures.

Class 23 (Airflow 04):

● Topics: Building and Scheduling Full ETL Pipelines


using Airflow: Scraping, Data Processing, Loading into
PostgreSQL.

Class 24 (Project):

● Project: Create a Full Data Pipeline using Airflow to


Scrape Data, Clean, and Load into PostgreSQL.

Module 05 (Kafka & Real-Time Data Streaming):

Class 25 (Kafka 01):


● Topics: Introduction to Apache Kafka, Installing Kafka,
Core Concepts (Producers, Consumers, Topics,
Partitions).

Class 26 (Kafka 02):

● Topics: Creating Kafka Producers and Consumers with


Python, Setting up Kafka Clusters, Basic Operations.

Class 27 (Kafka 03):

● Topics: Using Kafka for Real-Time Data Streaming from


Web Scrapers, Handling Large Volumes of Scraped
Data.

Class 28 (Kafka 04):

● Topics: Kafka Streams for Data Processing, Integrating


Kafka with PostgreSQL for Real-Time Ingestion.

Class 29 (Kafka 05):

● Topics: Kafka with Docker: Running Kafka in


Containers, Scaling Kafka with Multiple Brokers.

Class 30 (Project):

● Project: Build a Real-Time Data Streaming Pipeline


using Kafka, Docker, and PostgreSQL for Streaming
Web Scraped Data.

Final Project (Capstone):

Class 31 (Capstone Project):


● Topics: Integrating All Components (Web Scraping,
Docker, PostgreSQL, Airflow, Kafka) into a Full Data
Engineering Pipeline.
● Project: Build a Complete End-to-End Data Pipeline:
○ Scrape data using Selenium/BeautifulSoup.
○ Use Kafka for real-time streaming.
○ Store data in PostgreSQL.
○ Schedule and orchestrate tasks using Airflow.
○ Dockerize the entire pipeline.

You might also like