0% found this document useful (0 votes)
15 views20 pages

SQL Part 2

The document outlines a data pipeline architecture known as the Medallion Architecture, which consists of three layers: Bronze, Silver, and Gold. Each layer serves a specific purpose, starting from raw data collection to processed and aggregated data for analytics. It also describes the differences between OLTP and OLAP systems, highlighting their respective use cases in managing transactional data and performing complex data analysis.

Uploaded by

Dina Dwi Annisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views20 pages

SQL Part 2

The document outlines a data pipeline architecture known as the Medallion Architecture, which consists of three layers: Bronze, Silver, and Gold. Each layer serves a specific purpose, starting from raw data collection to processed and aggregated data for analytics. It also describes the differences between OLTP and OLAP systems, highlighting their respective use cases in managing transactional data and performing complex data analysis.

Uploaded by

Dina Dwi Annisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Pipeline

BigQuery Data
Transfer Service

Bronze Silver Gold

Raw Integration Data Clean and tranform aggregated data


measures

Source
The source is the starting point in the data pipeline where raw data is collected.
Data can come from various sources such as:
* Relational Databases (MySQL, PostgreSQL)
* File Storage (CSV, Excel, JSON)
* APIs (REST, SOAP)
* Others

Destination
The destination is the end point where processed data is stored or used.
Destinations can include:
* Data Warehouses (BigQuery, Snowflake, Redshift)
* Data Lakes (Amazon S3, Google Cloud Storage)

OLTP OLAP

OLTP (Online Transaction Processing)


OLTP systems are designed to manage transactional data. They support day-
to-day operations and are optimized for many short online transactions.
Key Features:

Transaction-Oriented : OLTP systems manage transactions that involve insert,


update, and delete operations.
Real-Time Processing : Data is processed in real-time, ensuring that the system
reflects the current state of business operations.
High Throughput : Optimized for handling a large number of small
transactions per second.
Use Case : Ideal for applications like order processing, inventory
management, and customer relationship management
(CRM).
Example
If your project involves managing real-time sales transactions, customer
orders, or inventory updates, OLTP would be used. For example, a MySQL
database could handle incoming sales transactions, ensuring that each order is
processed quickly and accurately.

OLAP (Online Analytical Processing)


OLAP systems are designed for complex queries and data analysis. They are
used to support business decision-making and provide insights into the data
through various analytical operations.
Key Features:

Complex Queries OLAP systems handle complex queries that involve


aggregations, joins, and multi-dimensional analysis.
Data Warehousing OLAP is often associated with data warehousing, where
large volumes of historical data are stored for analysis.
Performance Optimized for read-heavy operations and can handle
large volumes of data efficiently.
Use Case : Ideal for Data Warehousing, business intelligenc

Example
If your project involves analyzing sales data to identify trends, forecast
future sales, or generate detailed reports, OLAP would be used. For instance,
using a data warehouse like BigQuery, you could run complex SQL queries to
aggregate sales data across different regions and time periods to gain
insights.
Medalli n Architecture:
The Three-Layer Design for Data Management

The Medallion Architecture is a data architecture design pattern used in data


warehousing. It organizes data processing and storage into multiple layers, each
serving a specific purpose. A medallion architecture consists of three layers:
Bronze, Silver and Gold. Data flows from one layer to the next, gradually moving
from raw, unstructured data to high-quality

Bronze Silver Gold

The Three-Layer Design for Data Management

Bronze
Bronze data is the initial stage of a data pipeline. One way to think of Bronze
data is in raw formats, coming from many different sources, streams, or
batch jobs. In many cases, this data is ingested in various forms, including json,
and is untransformed and unchanged from whatever sources produced it.
Often, services need access to this raw data and do not need any additional
transformations. However, to increase its value further, this data needs
cleansing and filtering to transform it into something more consumable.
The Bronze layer contains unvalidated data. Data ingested in the bronze layer
typically :
Maintains the raw state of the data source.
It is appended incrementally and grows over time.

Examlple query : CREATE TABLE staging_orders (


id INT PRIMARY KEY,
user_id INT,
order_number VARCHAR(50),
total DECIMAL(10, 2),
payment_method VARCHAR(50),
created_at TIMESTAMP,
updated_at TIMESTAMP
);

Silver
Silver data is considered cleansed and processed to make it more accessible.
Data can be normalized at this stage to make it easier to query. Silver data is
more sanitized, cleaner, and filtered to give a more refined view of the data.
Values are better understood, tables are joined, and constraints are added to
create better data integrity, adding additional value. This results in staged,
accurate datasets and useful structures that can be queried by analytical
services and serve a wider purpose for an enterprise.
Silver
Recall that while the Bronze layer contains the entire data history in a nearly
raw state, the silver layer represents a validated, enriched version of our data
that can be trusted for downstream analytics.
Data is cleansed and transformed to ensure consistency and quality.
This layer serves as a preparation stage for final analysis and reporting.

Examlple query : CREATE TABLE intermediate_orders


AS
SELECT
id,
user_id,
order_number,
total,
payment_method,
created_at,
updated_at
FROM
staging_orders;

Gold
Gold data is summarising data and adding business-level aggregations. It is most
useful for analytics as it is presented in a well-constructed way and is ready to
be visualized through Business Intelligence and Analytics dashboards or trained by
Machine Learning models for predictive analytics solutions.

Examlple query :
CREATE TABLE mart_inventory_summary AS
SELECT
i.id AS inventory_id,
i.product_id,
p.product_name,
p.brand_name,
p.department,
i.stock,
i.product_cost,
d.name AS distribution_center_name,
d.city AS distribution_center_city,
d.state AS distribution_center_state,
i.created_at AS inventory_created_at,
i.updated_at AS inventory_updated_at
FROM
intermediate_inventory_items i
INNER JOIN
intermediate_products p ON i.product_id = p.id
INNER JOIN
intermediate_distribution_centers d ON i.distribution_center_id = d.id;
Create Table Staging (BRONZE)

CREATE TABLE
staging_distribution_centers (
Staging Table id INT PRIMARY KEY,
Distribution Centers name VARCHAR(255),
latitude DECIMAL(9, 6),
longitude DECIMAL(9, 6)
);

CREATE TABLE staging_events (


id INT PRIMARY KEY,
Staging Table Events user_id INT,
sequence_number INT,
session_id VARCHAR(255),
created_at TIMESTAMP,
ip_address VARCHAR(255),
city VARCHAR(255),
state VARCHAR(255),
postal_code VARCHAR(10),
browser VARCHAR(50),
traffic_source VARCHAR(50),
uri VARCHAR(255),
event_type VARCHAR(50)
);

CREATE TABLE
staging_inventory_items (
Staging Table Inventory Items id INT PRIMARY KEY,
product_id INT,
created_at TIMESTAMP,
updated_at TIMESTAMP,
stock INT,
product_cost DECIMAL(10, 2),
product_code VARCHAR(50),
product_name VARCHAR(255),
brand_name VARCHAR(255),
department VARCHAR(50),
sku VARCHAR(50),
distribution_center_id INT
);
CREATE TABLE staging_order_items
(
Staging Table Orders_items id INT PRIMARY KEY,
order_id INT,
product_id INT,
quantity INT,
subtotal DECIMAL(10, 2)
);

CREATE TABLE staging_orders (


id INT PRIMARY KEY,
Staging Table Orders user_id INT,
order_number VARCHAR(50),
total DECIMAL(10, 2),
payment_method VARCHAR(50),
created_at TIMESTAMP,
updated_at TIMESTAMP
);

CREATE TABLE staging_users (


id INT PRIMARY KEY,
Staging Table Users first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(255),
age INT,
gender VARCHAR(1),
state VARCHAR(50),
street_address VARCHAR(255),
postal_code VARCHAR(10),
city VARCHAR(255),
country VARCHAR(50),
latitude DECIMAL(9, 6),
longitude DECIMAL(9, 6),
traffic_source VARCHAR(50),
created_at TIMESTAMP
);
CREATE TABLE staging_products (
id INT PRIMARY KEY,
Staging Table Products product_code VARCHAR(50),
product_name VARCHAR(255),
brand_name VARCHAR(255),
department VARCHAR(50),
price DECIMAL(10, 2),
sku VARCHAR(50),
distribution_center_id INT
);
Data Pipeline

BigQuery Data
Transfer Service

Bronze Silver Gold

Raw Integration Data Clean and tranform aggregated data


measures

Source
The source is the starting point in the data pipeline where raw data is collected.
Data can come from various sources such as:
* Relational Databases (MySQL, PostgreSQL)
* File Storage (CSV, Excel, JSON)
* APIs (REST, SOAP)
* Others

Destination
The destination is the end point where processed data is stored or used.
Destinations can include:
* Data Warehouses (BigQuery, Snowflake, Redshift)
* Data Lakes (Amazon S3, Google Cloud Storage)

OLTP OLAP

OLTP (Online Transaction Processing)


OLTP systems are designed to manage transactional data. They support day-
to-day operations and are optimized for many short online transactions.
Key Features:

Transaction-Oriented : OLTP systems manage transactions that involve insert,


update, and delete operations.
Real-Time Processing : Data is processed in real-time, ensuring that the system
reflects the current state of business operations.
High Throughput : Optimized for handling a large number of small
transactions per second.
Use Case : Ideal for applications like order processing, inventory
management, and customer relationship management
(CRM).
Example
If your project involves managing real-time sales transactions, customer
orders, or inventory updates, OLTP would be used. For example, a MySQL
database could handle incoming sales transactions, ensuring that each order is
processed quickly and accurately.

OLAP (Online Analytical Processing)


OLAP systems are designed for complex queries and data analysis. They are
used to support business decision-making and provide insights into the data
through various analytical operations.
Key Features:

Complex Queries OLAP systems handle complex queries that involve


aggregations, joins, and multi-dimensional analysis.
Data Warehousing OLAP is often associated with data warehousing, where
large volumes of historical data are stored for analysis.
Performance Optimized for read-heavy operations and can handle
large volumes of data efficiently.
Use Case : Ideal for Data Warehousing, business intelligenc

Example
If your project involves analyzing sales data to identify trends, forecast
future sales, or generate detailed reports, OLAP would be used. For instance,
using a data warehouse like BigQuery, you could run complex SQL queries to
aggregate sales data across different regions and time periods to gain
insights.
Medalli n Architecture:
The Three-Layer Design for Data Management

The Medallion Architecture is a data architecture design pattern used in data


warehousing. It organizes data processing and storage into multiple layers, each
serving a specific purpose. A medallion architecture consists of three layers:
Bronze, Silver and Gold. Data flows from one layer to the next, gradually moving
from raw, unstructured data to high-quality

Bronze Silver Gold

The Three-Layer Design for Data Management

Bronze
Bronze data is the initial stage of a data pipeline. One way to think of Bronze
data is in raw formats, coming from many different sources, streams, or
batch jobs. In many cases, this data is ingested in various forms, including json,
and is untransformed and unchanged from whatever sources produced it.
Often, services need access to this raw data and do not need any additional
transformations. However, to increase its value further, this data needs
cleansing and filtering to transform it into something more consumable.
The Bronze layer contains unvalidated data. Data ingested in the bronze layer
typically :
Maintains the raw state of the data source.
It is appended incrementally and grows over time.

Examlple query : CREATE TABLE staging_orders (


id INT PRIMARY KEY,
user_id INT,
order_number VARCHAR(50),
total DECIMAL(10, 2),
payment_method VARCHAR(50),
created_at TIMESTAMP,
updated_at TIMESTAMP
);

Silver
Silver data is considered cleansed and processed to make it more accessible.
Data can be normalized at this stage to make it easier to query. Silver data is
more sanitized, cleaner, and filtered to give a more refined view of the data.
Values are better understood, tables are joined, and constraints are added to
create better data integrity, adding additional value. This results in staged,
accurate datasets and useful structures that can be queried by analytical
services and serve a wider purpose for an enterprise.
Silver
Recall that while the Bronze layer contains the entire data history in a nearly
raw state, the silver layer represents a validated, enriched version of our data
that can be trusted for downstream analytics.
Data is cleansed and transformed to ensure consistency and quality.
This layer serves as a preparation stage for final analysis and reporting.

Examlple query : CREATE TABLE intermediate_orders


AS
SELECT
id,
user_id,
order_number,
total,
payment_method,
created_at,
updated_at
FROM
staging_orders;

Gold
Gold data is summarising data and adding business-level aggregations. It is most
useful for analytics as it is presented in a well-constructed way and is ready to
be visualized through Business Intelligence and Analytics dashboards or trained by
Machine Learning models for predictive analytics solutions.

Examlple query :
CREATE TABLE mart_inventory_summary AS
SELECT
i.id AS inventory_id,
i.product_id,
p.product_name,
p.brand_name,
p.department,
i.stock,
i.product_cost,
d.name AS distribution_center_name,
d.city AS distribution_center_city,
d.state AS distribution_center_state,
i.created_at AS inventory_created_at,
i.updated_at AS inventory_updated_at
FROM
intermediate_inventory_items i
INNER JOIN
intermediate_products p ON i.product_id = p.id
INNER JOIN
intermediate_distribution_centers d ON i.distribution_center_id = d.id;
Create Table Staging (BRONZE)

CREATE TABLE
staging_distribution_centers (
Staging Table id INT PRIMARY KEY,
Distribution Centers name VARCHAR(255),
latitude DECIMAL(9, 6),
longitude DECIMAL(9, 6)
);

CREATE TABLE staging_events (


id INT PRIMARY KEY,
Staging Table Events user_id INT,
sequence_number INT,
session_id VARCHAR(255),
created_at TIMESTAMP,
ip_address VARCHAR(255),
city VARCHAR(255),
state VARCHAR(255),
postal_code VARCHAR(10),
browser VARCHAR(50),
traffic_source VARCHAR(50),
uri VARCHAR(255),
event_type VARCHAR(50)
);

CREATE TABLE
staging_inventory_items (
Staging Table Inventory Items id INT PRIMARY KEY,
product_id INT,
created_at TIMESTAMP,
updated_at TIMESTAMP,
stock INT,
product_cost DECIMAL(10, 2),
product_code VARCHAR(50),
product_name VARCHAR(255),
brand_name VARCHAR(255),
department VARCHAR(50),
sku VARCHAR(50),
distribution_center_id INT
);
CREATE TABLE staging_order_items
(
Staging Table Orders_items id INT PRIMARY KEY,
order_id INT,
product_id INT,
quantity INT,
subtotal DECIMAL(10, 2)
);

CREATE TABLE staging_orders (


id INT PRIMARY KEY,
Staging Table Orders user_id INT,
order_number VARCHAR(50),
total DECIMAL(10, 2),
payment_method VARCHAR(50),
created_at TIMESTAMP,
updated_at TIMESTAMP
);

CREATE TABLE staging_users (


id INT PRIMARY KEY,
Staging Table Users first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(255),
age INT,
gender VARCHAR(1),
state VARCHAR(50),
street_address VARCHAR(255),
postal_code VARCHAR(10),
city VARCHAR(255),
country VARCHAR(50),
latitude DECIMAL(9, 6),
longitude DECIMAL(9, 6),
traffic_source VARCHAR(50),
created_at TIMESTAMP
);
CREATE TABLE staging_products (
id INT PRIMARY KEY,
Staging Table Products product_code VARCHAR(50),
product_name VARCHAR(255),
brand_name VARCHAR(255),
department VARCHAR(50),
price DECIMAL(10, 2),
sku VARCHAR(50),
distribution_center_id INT
);
Data Pipeline

BigQuery Data
Transfer Service

Bronze Silver Gold

Raw Integration Data Clean and tranform aggregated data


measures

Source
The source is the starting point in the data pipeline where raw data is collected.
Data can come from various sources such as:
* Relational Databases (MySQL, PostgreSQL)
* File Storage (CSV, Excel, JSON)
* APIs (REST, SOAP)
* Others

Destination
The destination is the end point where processed data is stored or used.
Destinations can include:
* Data Warehouses (BigQuery, Snowflake, Redshift)
* Data Lakes (Amazon S3, Google Cloud Storage)

OLTP OLAP

OLTP (Online Transaction Processing)


OLTP systems are designed to manage transactional data. They support day-
to-day operations and are optimized for many short online transactions.
Key Features:

Transaction-Oriented : OLTP systems manage transactions that involve insert,


update, and delete operations.
Real-Time Processing : Data is processed in real-time, ensuring that the system
reflects the current state of business operations.
High Throughput : Optimized for handling a large number of small
transactions per second.
Use Case : Ideal for applications like order processing, inventory
management, and customer relationship management
(CRM).
Example
If your project involves managing real-time sales transactions, customer
orders, or inventory updates, OLTP would be used. For example, a MySQL
database could handle incoming sales transactions, ensuring that each order is
processed quickly and accurately.

OLAP (Online Analytical Processing)


OLAP systems are designed for complex queries and data analysis. They are
used to support business decision-making and provide insights into the data
through various analytical operations.
Key Features:

Complex Queries OLAP systems handle complex queries that involve


aggregations, joins, and multi-dimensional analysis.
Data Warehousing OLAP is often associated with data warehousing, where
large volumes of historical data are stored for analysis.
Performance Optimized for read-heavy operations and can handle
large volumes of data efficiently.
Use Case : Ideal for Data Warehousing, business intelligenc

Example
If your project involves analyzing sales data to identify trends, forecast
future sales, or generate detailed reports, OLAP would be used. For instance,
using a data warehouse like BigQuery, you could run complex SQL queries to
aggregate sales data across different regions and time periods to gain
insights.
Medalli n Architecture:
The Three-Layer Design for Data Management

The Medallion Architecture is a data architecture design pattern used in data


warehousing. It organizes data processing and storage into multiple layers, each
serving a specific purpose. A medallion architecture consists of three layers:
Bronze, Silver and Gold. Data flows from one layer to the next, gradually moving
from raw, unstructured data to high-quality

Bronze Silver Gold

The Three-Layer Design for Data Management

Bronze
Bronze data is the initial stage of a data pipeline. One way to think of Bronze
data is in raw formats, coming from many different sources, streams, or
batch jobs. In many cases, this data is ingested in various forms, including json,
and is untransformed and unchanged from whatever sources produced it.
Often, services need access to this raw data and do not need any additional
transformations. However, to increase its value further, this data needs
cleansing and filtering to transform it into something more consumable.
The Bronze layer contains unvalidated data. Data ingested in the bronze layer
typically :
Maintains the raw state of the data source.
It is appended incrementally and grows over time.

Examlple query : CREATE TABLE staging_orders (


id INT PRIMARY KEY,
user_id INT,
order_number VARCHAR(50),
total DECIMAL(10, 2),
payment_method VARCHAR(50),
created_at TIMESTAMP,
updated_at TIMESTAMP
);

Silver
Silver data is considered cleansed and processed to make it more accessible.
Data can be normalized at this stage to make it easier to query. Silver data is
more sanitized, cleaner, and filtered to give a more refined view of the data.
Values are better understood, tables are joined, and constraints are added to
create better data integrity, adding additional value. This results in staged,
accurate datasets and useful structures that can be queried by analytical
services and serve a wider purpose for an enterprise.
Silver
Recall that while the Bronze layer contains the entire data history in a nearly
raw state, the silver layer represents a validated, enriched version of our data
that can be trusted for downstream analytics.
Data is cleansed and transformed to ensure consistency and quality.
This layer serves as a preparation stage for final analysis and reporting.

Examlple query : CREATE TABLE intermediate_orders


AS
SELECT
id,
user_id,
order_number,
total,
payment_method,
created_at,
updated_at
FROM
staging_orders;

Gold
Gold data is summarising data and adding business-level aggregations. It is most
useful for analytics as it is presented in a well-constructed way and is ready to
be visualized through Business Intelligence and Analytics dashboards or trained by
Machine Learning models for predictive analytics solutions.

Examlple query :
CREATE TABLE mart_inventory_summary AS
SELECT
i.id AS inventory_id,
i.product_id,
p.product_name,
p.brand_name,
p.department,
i.stock,
i.product_cost,
d.name AS distribution_center_name,
d.city AS distribution_center_city,
d.state AS distribution_center_state,
i.created_at AS inventory_created_at,
i.updated_at AS inventory_updated_at
FROM
intermediate_inventory_items i
INNER JOIN
intermediate_products p ON i.product_id = p.id
INNER JOIN
intermediate_distribution_centers d ON i.distribution_center_id = d.id;
Create Table Staging (BRONZE)

CREATE TABLE
staging_distribution_centers (
Staging Table id INT PRIMARY KEY,
Distribution Centers name VARCHAR(255),
latitude DECIMAL(9, 6),
longitude DECIMAL(9, 6)
);

CREATE TABLE staging_events (


id INT PRIMARY KEY,
Staging Table Events user_id INT,
sequence_number INT,
session_id VARCHAR(255),
created_at TIMESTAMP,
ip_address VARCHAR(255),
city VARCHAR(255),
state VARCHAR(255),
postal_code VARCHAR(10),
browser VARCHAR(50),
traffic_source VARCHAR(50),
uri VARCHAR(255),
event_type VARCHAR(50)
);

CREATE TABLE
staging_inventory_items (
Staging Table Inventory Items id INT PRIMARY KEY,
product_id INT,
created_at TIMESTAMP,
updated_at TIMESTAMP,
stock INT,
product_cost DECIMAL(10, 2),
product_code VARCHAR(50),
product_name VARCHAR(255),
brand_name VARCHAR(255),
department VARCHAR(50),
sku VARCHAR(50),
distribution_center_id INT
);
CREATE TABLE staging_order_items
(
Staging Table Orders_items id INT PRIMARY KEY,
order_id INT,
product_id INT,
quantity INT,
subtotal DECIMAL(10, 2)
);

CREATE TABLE staging_orders (


id INT PRIMARY KEY,
Staging Table Orders user_id INT,
order_number VARCHAR(50),
total DECIMAL(10, 2),
payment_method VARCHAR(50),
created_at TIMESTAMP,
updated_at TIMESTAMP
);

CREATE TABLE staging_users (


id INT PRIMARY KEY,
Staging Table Users first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(255),
age INT,
gender VARCHAR(1),
state VARCHAR(50),
street_address VARCHAR(255),
postal_code VARCHAR(10),
city VARCHAR(255),
country VARCHAR(50),
latitude DECIMAL(9, 6),
longitude DECIMAL(9, 6),
traffic_source VARCHAR(50),
created_at TIMESTAMP
);

You might also like