Find the interactive version of this roadmap
Pre-requisites and more roadmaps at
Python Roadmap roadmap.sh
SQL Roadmap Data Engineer
What is Data Engineering?
Related Roadmaps
Data Engineering vs Data Science
Data Analyst Roadmap
AI & Data Scientist Roadmap Introduction Skills and Responsibilities
Data Engineering Lifecycle
Python is recommended
Choosing the Right Technologies
Python Java Scala Go
Programming Skills Learn the Basics
Understand Different Steps
Data Structures and Algorithms Data Generation 1
Git and GitHub Data Storage 2
Data Engineering Lifecycle
Linux Basics Data Ingestion 3
Networking Fundamentals Data Serving 4
Distributed Systems Basics Data Generation
Database
Data Normalization
Sources of Data APIs Logs
Data Modelling Techniques
Data Collection Considerations Mobile Apps IoT
CAP Theorem
OLTP vs OLAP
Learn SQL Indexing
Database Fundamentals Data Storage Transactions
Relational Databases
Slowly Changing Dimension - SCD
MySQL PostgreSQL
Horizontal vs Vertical Scaling Relational Databases
MariaDB Aurora DB
Star vs Snowflake Schema
Oracle MS SQL
Document Column NoSQL Databsases Graph Key-Value
MongoDB Cassandra Neo4j Redis
ElasticSearch BigTable Neptune Memcached
Data Warehousing
CosmosDB HBase DynamoDB
CouchDB
What is Data Warehouse?
Data Warehouse Data Warehousing Architectures Data Mart Data Mesh
Google BigQuery Other Data Architectures
Snowflake Data Fabric Data Hub
Amazon Redshift Metadata-first Architecture
Serverless Options
Data Lake
Databricks Delta Lake
Cloud Computing
Cloud Architectures
Snowflake
Onehouse Amazon EC2 ( Compute)
Data Ingestion S3 (Storage)
Amazon RDS (Database)
Batch
Amazon RDS (Database)
Hybrid Types of Data Ingestion
AWS
Streaming
Azure Virtual Machines
Realtime
Data Pipelines
Azure Blob Storage
ETL Process
Azure SQL Database
Cluster Computing Basics Extract Data
Data Factory (ETL)
Transform Data
Azure
What is Cluster Computing
Load Data
Compute Engine (Compute)
Distributed File Systems
Data Pipeline Tools
Google Cloud Storage
HDFS Apache Airflow
Cloud SQL (Database)
dbt Luigi
Job Scheduling Dataflow
Perfect
Cluster Management Tools Google Cloud
Cloud Providers
Kubernetes
Apache Hadoop YARN Big Data Tools Apache Spark
Hadoop Ecosystem
Docker Kubernetes
HDFS YARN
Google Cloud GKE Containers & Orchestration
MapReduce
AWS EKS
Prometheus CI/CD GitHub Actions Circle CI
Datadog Sentry Monitoring GitLab CI ArgoCD
New Relic
Unit Testing
Integration Testing
What and why use them? Testing End-to-End Testing
Async vs Sync Communication Functional Testing
Messages vs Streams A/B Testing
Best Practices Messaging Systems Load Testing
Smoke Testing
Common Tools
Apache Kafka
RabbitMQ Infrastructure as Code - IaC Declarative vs Imperative
AWS SQS Idempotency
AWS SNS Reusability
Environmental Management
Visit the Data Analyst Roadmap Common Tools
Data Terraform
Data Analytics
Serving
OpenTofu
Business Intelligence
AWS CDK
BI Tools
Authentication vs Authorization
Google Deployment Mgr.
Microsoft Power BI
Encryption
Streamlit
Tokenization
Tableu
Data Masking
Looker
Data Obfuscation
Data Quality
Data Lineage
Reverse ETL Security
Metadata Management
ETL vs Reverse ETL Data Interoperability
Reverse ETL Usecases Data Quality
Data Governance
Tools
Hightouch
Census GDPR
Segment Privacy ECPA
EU AI Act
Data and AI Regulations
Machine Learning
MLOps
Also visit the following related roadmaps
Python AI & Data Scientist SQL Data Analyst MLOps