Subject Objective
Week 1: Introduction and - Running Postgres locally with Docker
Prerequisites -- Setting Setting up
up Airflow
Snowflakelocally
Cloud Data Warehouse
Week 2: Data Ingestion - Ingesting data to AWS with Airflow
-- Partitioning andtoClustering
Ingesting data local Postgres with Airflow
Week 3: Data Warehouse -- Postgres and dbt
Best practices
Week 4: Analytics Engineering -- dbt Whatmodels
is Spark
Week 5: Batch Processing -- Testing and documenting
Spark Dataframes
Week 6: Streaming -- Schemas
Spark SQL(avro)
Processing -- Kafka Streams with Great Expectations and Deequ
Data validation
Week 7: Data Quality -- Pipeline
Week 8: Orchestration and Anomalyorchestration
detection andbenefits
incremental validation with Deequ
- Creating Data Lineage
Automation -- Week 9: working on your project
Week 9 : Capstone Project Event-based vs time-based ; business driven vs data driven
- Week 10 (extra): reviewing your peers
python Labs function and 3 DDL for 3 normal form tables.
- Forward and Backward data format
-- Sample End-to-End data pipeline
Setup Docker
-- Setup MinIO for datalake
Colllect data from API, Database
- Build Pipeline to load data from datalake to data warehouse
-with Schedule dbt pipeline
adenpotent patternwith Airlfow (Astronomer)
-- Processing
Connect BI large data with
tool (Google Spark
Studio / Metabase) with data
- Trigger and schedule spark job
-- Setup schema
Apply Spark jobregister and ML
to process validation
pipeline
-- Analyze real-time data
Implement dataops with dbt and schduling with Airflow
-- Data Quality
Research datawith Great Expectations
lineage
- Design data model for logging and lineage
To be defined