Contents
● What is Data Engineering
● Data Engineering Project Architecture
● Roles of Data Engineer
● OLTP vs OLAP
● Database vs Data Lake vs. Data Warehouse
What is Data Engineering?
● Data Engineering is a process of collecting, preparing and organizing data so that it can
be used for analysis.
Data Engineering Project Architecture
Explanation:
● Data Collection: Data is extracted from various sources like databases, files, and APIs.
● Data Ingestion: The collected data is loaded into the data lake (ADLS, AWS S3, or HDFS)
for storage.
● Data Processing: Data is transformed and cleaned using tools like Databricks, Python,
Apache Spark, and Hive.
● Data Warehousing: The processed data is loaded into the data warehouse (Snowflake or
Azure Synapse Analytics) for efficient querying and analysis.
● Target: The data is used for various purposes like reporting, visualization, machine
learning, and data science.
Role of Data Engineer
● Understand data sources
● Data Collection and Data Ingestion
● Data Processing
● Create, Modifying and Monitor Data Pipeline
● Performance Optimization
OLTP vs. OLAP
● OLTP (Online Transaction Processing)
o Supports day-to-day transactional activities.
● OLAP (Online Analytical Processing)
o Supports complex analysis of historical data.
Database vs Data Lake vs. Data Warehouse
● Database
o Stores structured data.
o Used in case of OLTP.
● Data Lake
o Stores structured semi-structured and unstructured data.
o Used in case of OLAP.
● Data Warehouse
o Stores structured data/ cleaned data.
o Used in case of OLAP.