SQL Part 2
SQL Part 2
BigQuery Data
Transfer Service
Source
The source is the starting point in the data pipeline where raw data is collected.
Data can come from various sources such as:
* Relational Databases (MySQL, PostgreSQL)
* File Storage (CSV, Excel, JSON)
* APIs (REST, SOAP)
* Others
Destination
The destination is the end point where processed data is stored or used.
Destinations can include:
* Data Warehouses (BigQuery, Snowflake, Redshift)
* Data Lakes (Amazon S3, Google Cloud Storage)
OLTP OLAP
Example
If your project involves analyzing sales data to identify trends, forecast
future sales, or generate detailed reports, OLAP would be used. For instance,
using a data warehouse like BigQuery, you could run complex SQL queries to
aggregate sales data across different regions and time periods to gain
insights.
Medalli n Architecture:
The Three-Layer Design for Data Management
Bronze
Bronze data is the initial stage of a data pipeline. One way to think of Bronze
data is in raw formats, coming from many different sources, streams, or
batch jobs. In many cases, this data is ingested in various forms, including json,
and is untransformed and unchanged from whatever sources produced it.
Often, services need access to this raw data and do not need any additional
transformations. However, to increase its value further, this data needs
cleansing and filtering to transform it into something more consumable.
The Bronze layer contains unvalidated data. Data ingested in the bronze layer
typically :
Maintains the raw state of the data source.
It is appended incrementally and grows over time.
Silver
Silver data is considered cleansed and processed to make it more accessible.
Data can be normalized at this stage to make it easier to query. Silver data is
more sanitized, cleaner, and filtered to give a more refined view of the data.
Values are better understood, tables are joined, and constraints are added to
create better data integrity, adding additional value. This results in staged,
accurate datasets and useful structures that can be queried by analytical
services and serve a wider purpose for an enterprise.
Silver
Recall that while the Bronze layer contains the entire data history in a nearly
raw state, the silver layer represents a validated, enriched version of our data
that can be trusted for downstream analytics.
Data is cleansed and transformed to ensure consistency and quality.
This layer serves as a preparation stage for final analysis and reporting.
Gold
Gold data is summarising data and adding business-level aggregations. It is most
useful for analytics as it is presented in a well-constructed way and is ready to
be visualized through Business Intelligence and Analytics dashboards or trained by
Machine Learning models for predictive analytics solutions.
Examlple query :
CREATE TABLE mart_inventory_summary AS
SELECT
i.id AS inventory_id,
i.product_id,
p.product_name,
p.brand_name,
p.department,
i.stock,
i.product_cost,
d.name AS distribution_center_name,
d.city AS distribution_center_city,
d.state AS distribution_center_state,
i.created_at AS inventory_created_at,
i.updated_at AS inventory_updated_at
FROM
intermediate_inventory_items i
INNER JOIN
intermediate_products p ON i.product_id = p.id
INNER JOIN
intermediate_distribution_centers d ON i.distribution_center_id = d.id;
Create Table Staging (BRONZE)
CREATE TABLE
staging_distribution_centers (
Staging Table id INT PRIMARY KEY,
Distribution Centers name VARCHAR(255),
latitude DECIMAL(9, 6),
longitude DECIMAL(9, 6)
);
CREATE TABLE
staging_inventory_items (
Staging Table Inventory Items id INT PRIMARY KEY,
product_id INT,
created_at TIMESTAMP,
updated_at TIMESTAMP,
stock INT,
product_cost DECIMAL(10, 2),
product_code VARCHAR(50),
product_name VARCHAR(255),
brand_name VARCHAR(255),
department VARCHAR(50),
sku VARCHAR(50),
distribution_center_id INT
);
CREATE TABLE staging_order_items
(
Staging Table Orders_items id INT PRIMARY KEY,
order_id INT,
product_id INT,
quantity INT,
subtotal DECIMAL(10, 2)
);
BigQuery Data
Transfer Service
Source
The source is the starting point in the data pipeline where raw data is collected.
Data can come from various sources such as:
* Relational Databases (MySQL, PostgreSQL)
* File Storage (CSV, Excel, JSON)
* APIs (REST, SOAP)
* Others
Destination
The destination is the end point where processed data is stored or used.
Destinations can include:
* Data Warehouses (BigQuery, Snowflake, Redshift)
* Data Lakes (Amazon S3, Google Cloud Storage)
OLTP OLAP
Example
If your project involves analyzing sales data to identify trends, forecast
future sales, or generate detailed reports, OLAP would be used. For instance,
using a data warehouse like BigQuery, you could run complex SQL queries to
aggregate sales data across different regions and time periods to gain
insights.
Medalli n Architecture:
The Three-Layer Design for Data Management
Bronze
Bronze data is the initial stage of a data pipeline. One way to think of Bronze
data is in raw formats, coming from many different sources, streams, or
batch jobs. In many cases, this data is ingested in various forms, including json,
and is untransformed and unchanged from whatever sources produced it.
Often, services need access to this raw data and do not need any additional
transformations. However, to increase its value further, this data needs
cleansing and filtering to transform it into something more consumable.
The Bronze layer contains unvalidated data. Data ingested in the bronze layer
typically :
Maintains the raw state of the data source.
It is appended incrementally and grows over time.
Silver
Silver data is considered cleansed and processed to make it more accessible.
Data can be normalized at this stage to make it easier to query. Silver data is
more sanitized, cleaner, and filtered to give a more refined view of the data.
Values are better understood, tables are joined, and constraints are added to
create better data integrity, adding additional value. This results in staged,
accurate datasets and useful structures that can be queried by analytical
services and serve a wider purpose for an enterprise.
Silver
Recall that while the Bronze layer contains the entire data history in a nearly
raw state, the silver layer represents a validated, enriched version of our data
that can be trusted for downstream analytics.
Data is cleansed and transformed to ensure consistency and quality.
This layer serves as a preparation stage for final analysis and reporting.
Gold
Gold data is summarising data and adding business-level aggregations. It is most
useful for analytics as it is presented in a well-constructed way and is ready to
be visualized through Business Intelligence and Analytics dashboards or trained by
Machine Learning models for predictive analytics solutions.
Examlple query :
CREATE TABLE mart_inventory_summary AS
SELECT
i.id AS inventory_id,
i.product_id,
p.product_name,
p.brand_name,
p.department,
i.stock,
i.product_cost,
d.name AS distribution_center_name,
d.city AS distribution_center_city,
d.state AS distribution_center_state,
i.created_at AS inventory_created_at,
i.updated_at AS inventory_updated_at
FROM
intermediate_inventory_items i
INNER JOIN
intermediate_products p ON i.product_id = p.id
INNER JOIN
intermediate_distribution_centers d ON i.distribution_center_id = d.id;
Create Table Staging (BRONZE)
CREATE TABLE
staging_distribution_centers (
Staging Table id INT PRIMARY KEY,
Distribution Centers name VARCHAR(255),
latitude DECIMAL(9, 6),
longitude DECIMAL(9, 6)
);
CREATE TABLE
staging_inventory_items (
Staging Table Inventory Items id INT PRIMARY KEY,
product_id INT,
created_at TIMESTAMP,
updated_at TIMESTAMP,
stock INT,
product_cost DECIMAL(10, 2),
product_code VARCHAR(50),
product_name VARCHAR(255),
brand_name VARCHAR(255),
department VARCHAR(50),
sku VARCHAR(50),
distribution_center_id INT
);
CREATE TABLE staging_order_items
(
Staging Table Orders_items id INT PRIMARY KEY,
order_id INT,
product_id INT,
quantity INT,
subtotal DECIMAL(10, 2)
);
BigQuery Data
Transfer Service
Source
The source is the starting point in the data pipeline where raw data is collected.
Data can come from various sources such as:
* Relational Databases (MySQL, PostgreSQL)
* File Storage (CSV, Excel, JSON)
* APIs (REST, SOAP)
* Others
Destination
The destination is the end point where processed data is stored or used.
Destinations can include:
* Data Warehouses (BigQuery, Snowflake, Redshift)
* Data Lakes (Amazon S3, Google Cloud Storage)
OLTP OLAP
Example
If your project involves analyzing sales data to identify trends, forecast
future sales, or generate detailed reports, OLAP would be used. For instance,
using a data warehouse like BigQuery, you could run complex SQL queries to
aggregate sales data across different regions and time periods to gain
insights.
Medalli n Architecture:
The Three-Layer Design for Data Management
Bronze
Bronze data is the initial stage of a data pipeline. One way to think of Bronze
data is in raw formats, coming from many different sources, streams, or
batch jobs. In many cases, this data is ingested in various forms, including json,
and is untransformed and unchanged from whatever sources produced it.
Often, services need access to this raw data and do not need any additional
transformations. However, to increase its value further, this data needs
cleansing and filtering to transform it into something more consumable.
The Bronze layer contains unvalidated data. Data ingested in the bronze layer
typically :
Maintains the raw state of the data source.
It is appended incrementally and grows over time.
Silver
Silver data is considered cleansed and processed to make it more accessible.
Data can be normalized at this stage to make it easier to query. Silver data is
more sanitized, cleaner, and filtered to give a more refined view of the data.
Values are better understood, tables are joined, and constraints are added to
create better data integrity, adding additional value. This results in staged,
accurate datasets and useful structures that can be queried by analytical
services and serve a wider purpose for an enterprise.
Silver
Recall that while the Bronze layer contains the entire data history in a nearly
raw state, the silver layer represents a validated, enriched version of our data
that can be trusted for downstream analytics.
Data is cleansed and transformed to ensure consistency and quality.
This layer serves as a preparation stage for final analysis and reporting.
Gold
Gold data is summarising data and adding business-level aggregations. It is most
useful for analytics as it is presented in a well-constructed way and is ready to
be visualized through Business Intelligence and Analytics dashboards or trained by
Machine Learning models for predictive analytics solutions.
Examlple query :
CREATE TABLE mart_inventory_summary AS
SELECT
i.id AS inventory_id,
i.product_id,
p.product_name,
p.brand_name,
p.department,
i.stock,
i.product_cost,
d.name AS distribution_center_name,
d.city AS distribution_center_city,
d.state AS distribution_center_state,
i.created_at AS inventory_created_at,
i.updated_at AS inventory_updated_at
FROM
intermediate_inventory_items i
INNER JOIN
intermediate_products p ON i.product_id = p.id
INNER JOIN
intermediate_distribution_centers d ON i.distribution_center_id = d.id;
Create Table Staging (BRONZE)
CREATE TABLE
staging_distribution_centers (
Staging Table id INT PRIMARY KEY,
Distribution Centers name VARCHAR(255),
latitude DECIMAL(9, 6),
longitude DECIMAL(9, 6)
);
CREATE TABLE
staging_inventory_items (
Staging Table Inventory Items id INT PRIMARY KEY,
product_id INT,
created_at TIMESTAMP,
updated_at TIMESTAMP,
stock INT,
product_cost DECIMAL(10, 2),
product_code VARCHAR(50),
product_name VARCHAR(255),
brand_name VARCHAR(255),
department VARCHAR(50),
sku VARCHAR(50),
distribution_center_id INT
);
CREATE TABLE staging_order_items
(
Staging Table Orders_items id INT PRIMARY KEY,
order_id INT,
product_id INT,
quantity INT,
subtotal DECIMAL(10, 2)
);