0% found this document useful (0 votes)
15 views3 pages

Data Engineering Overview

Data engineering involves collecting, preparing, and organizing data for analysis, with a project architecture that includes data collection, ingestion, processing, and warehousing. Data engineers are responsible for understanding data sources, managing data pipelines, and optimizing performance. The document also distinguishes between OLTP and OLAP systems, and compares databases, data lakes, and data warehouses based on their data storage capabilities.

Uploaded by

pandekrishna723
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views3 pages

Data Engineering Overview

Data engineering involves collecting, preparing, and organizing data for analysis, with a project architecture that includes data collection, ingestion, processing, and warehousing. Data engineers are responsible for understanding data sources, managing data pipelines, and optimizing performance. The document also distinguishes between OLTP and OLAP systems, and compares databases, data lakes, and data warehouses based on their data storage capabilities.

Uploaded by

pandekrishna723
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Contents

● What is Data Engineering

● Data Engineering Project Architecture

● Roles of Data Engineer

● OLTP vs OLAP

● Database vs Data Lake vs. Data Warehouse

What is Data Engineering?

● Data Engineering is a process of collecting, preparing and organizing data so that it can
be used for analysis.

Data Engineering Project Architecture

Explanation:

● Data Collection: Data is extracted from various sources like databases, files, and APIs.

● Data Ingestion: The collected data is loaded into the data lake (ADLS, AWS S3, or HDFS)
for storage.
● Data Processing: Data is transformed and cleaned using tools like Databricks, Python,
Apache Spark, and Hive.
● Data Warehousing: The processed data is loaded into the data warehouse (Snowflake or
Azure Synapse Analytics) for efficient querying and analysis.
● Target: The data is used for various purposes like reporting, visualization, machine
learning, and data science.

Role of Data Engineer

● Understand data sources

● Data Collection and Data Ingestion

● Data Processing

● Create, Modifying and Monitor Data Pipeline

● Performance Optimization

OLTP vs. OLAP

● OLTP (Online Transaction Processing)


o Supports day-to-day transactional activities.
● OLAP (Online Analytical Processing)
o Supports complex analysis of historical data.

Database vs Data Lake vs. Data Warehouse

● Database
o Stores structured data.
o Used in case of OLTP.
● Data Lake
o Stores structured semi-structured and unstructured data.
o Used in case of OLAP.
● Data Warehouse
o Stores structured data/ cleaned data.
o Used in case of OLAP.

You might also like