0% found this document useful (0 votes)
43 views1 page

Data Engineer Path Career

The document outlines various roadmaps for data-related fields, including Data Engineering, Data Analysis, and Machine Learning. It highlights essential skills, tools, and technologies needed for each role, such as programming languages, database management, data warehousing, and cloud computing. Additionally, it provides links to interactive versions of the roadmaps and related resources for further learning.

Uploaded by

anhnt23413b
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views1 page

Data Engineer Path Career

The document outlines various roadmaps for data-related fields, including Data Engineering, Data Analysis, and Machine Learning. It highlights essential skills, tools, and technologies needed for each role, such as programming languages, database management, data warehousing, and cloud computing. Additionally, it provides links to interactive versions of the roadmaps and related resources for further learning.

Uploaded by

anhnt23413b
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Find the interactive version of this roadmap

Pre-requisites and more roadmaps at

Python Roadmap roadmap.sh

SQL Roadmap Data Engineer


What is Data Engineering?
Related Roadmaps
Data Engineering vs Data Science
Data Analyst Roadmap
AI & Data Scientist Roadmap Introduction Skills and Responsibilities

Data Engineering Lifecycle


Python is recommended
Choosing the Right Technologies
Python Java Scala Go

Programming Skills Learn the Basics


Understand Different Steps

Data Structures and Algorithms Data Generation 1

Git and GitHub Data Storage 2


Data Engineering Lifecycle
Linux Basics Data Ingestion 3

Networking Fundamentals Data Serving 4

Distributed Systems Basics Data Generation

Database
Data Normalization
Sources of Data APIs Logs
Data Modelling Techniques
Data Collection Considerations Mobile Apps IoT
CAP Theorem

OLTP vs OLAP

Learn SQL Indexing

Database Fundamentals Data Storage Transactions

Relational Databases

Slowly Changing Dimension - SCD


MySQL PostgreSQL

Horizontal vs Vertical Scaling Relational Databases


MariaDB Aurora DB

Star vs Snowflake Schema


Oracle MS SQL

Document Column NoSQL Databsases Graph Key-Value

MongoDB Cassandra Neo4j Redis

ElasticSearch BigTable Neptune Memcached


Data Warehousing
CosmosDB HBase DynamoDB

CouchDB

What is Data Warehouse?

Data Warehouse Data Warehousing Architectures Data Mart Data Mesh

Google BigQuery Other Data Architectures

Snowflake Data Fabric Data Hub

Amazon Redshift Metadata-first Architecture

Serverless Options
Data Lake

Databricks Delta Lake


Cloud Computing
Cloud Architectures
Snowflake

Onehouse Amazon EC2 ( Compute)

Data Ingestion S3 (Storage)

Amazon RDS (Database)


Batch
Amazon RDS (Database)
Hybrid Types of Data Ingestion
AWS
Streaming
Azure Virtual Machines
Realtime
Data Pipelines
Azure Blob Storage

ETL Process
Azure SQL Database
Cluster Computing Basics Extract Data
Data Factory (ETL)
Transform Data
Azure
What is Cluster Computing
Load Data
Compute Engine (Compute)
Distributed File Systems
Data Pipeline Tools
Google Cloud Storage
HDFS Apache Airflow
Cloud SQL (Database)
dbt Luigi
Job Scheduling Dataflow
Perfect
Cluster Management Tools Google Cloud

Cloud Providers
Kubernetes

Apache Hadoop YARN Big Data Tools Apache Spark

Hadoop Ecosystem

Docker Kubernetes
HDFS YARN

Google Cloud GKE Containers & Orchestration


MapReduce

AWS EKS

Prometheus CI/CD GitHub Actions Circle CI

Datadog Sentry Monitoring GitLab CI ArgoCD

New Relic

Unit Testing

Integration Testing

What and why use them? Testing End-to-End Testing

Async vs Sync Communication Functional Testing

Messages vs Streams A/B Testing

Best Practices Messaging Systems Load Testing

Smoke Testing
Common Tools

Apache Kafka

RabbitMQ Infrastructure as Code - IaC Declarative vs Imperative

AWS SQS Idempotency

AWS SNS Reusability

Environmental Management

Visit the Data Analyst Roadmap Common Tools

Data Terraform
Data Analytics
Serving
OpenTofu
Business Intelligence

AWS CDK
BI Tools
Authentication vs Authorization
Google Deployment Mgr.
Microsoft Power BI
Encryption

Streamlit
Tokenization

Tableu
Data Masking

Looker
Data Obfuscation

Data Quality

Data Lineage
Reverse ETL Security
Metadata Management

ETL vs Reverse ETL Data Interoperability

Reverse ETL Usecases Data Quality


Data Governance

Tools

Hightouch

Census GDPR

Segment Privacy ECPA

EU AI Act

Data and AI Regulations


Machine Learning

MLOps

Also visit the following related roadmaps

Python AI & Data Scientist SQL Data Analyst MLOps

You might also like