0% found this document useful (0 votes)

34 views27 pages

ADF - Dataflow Mapping

Mapping Data Flows in Azure Data Factory allows for no-code data transformation and movement, supporting various activities such as joins, aggregations, and filtering. Key transformations include Source, Derived Column, Join, and Sink, each serving specific use cases for data processing. While scalable and integrated with Azure services, data flows may present a learning curve and cost considerations for large datasets.

Uploaded by

umasankar48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views27 pages

ADF - Dataflow Mapping

Uploaded by

umasankar48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Mapping DataFlows

Your Mapping Data Flows in Azure Data Factory

(ADF) allows you to transform and move data in a
nocode environment. Data flows support a variety of
transformation activities to achieve tasks like
aggregations, filtering, joins, and more.

Here’s a detailed overview of the common

mapping data flow transformations, along with
examples and use cases:
1 Source Transformation
The Source transformation is where data flows begin. It reads data from a specified
dataset, such as Azure Blob Storage, Azure SQL Database, or Data Lake, into the data flow
for further processing.

Use Case: Ingest data from different sources, such as importing customer data from a
CSV file in Azure Blob Storage.

Example:
Scenario: You want to load a CSV file named SalesData.csv from Azure Blob Storage.
Objective: Read the file into the data flow to start the transformation process.
Configuration:
a.Add a Source transformation to the data flow.
b.Select the source dataset, e.g., the SalesData CSV file.
c.Configure Projection to define the schema (e.g., OrderID, ProductID, Quantity,
SalesAmount).
d.Optionally, enable Schema Drift to handle dynamic column changes in the dataset.

Outcome: The data flow reads data from SalesData.csv and makes it available for
subsequent transformations.

Benefit: Provides a flexible way to ingest data from various sources and shapes it for
processing within the data flow.

2. Derived Column Transformation

The Derived Column transformation allows you to create new columns or modify existing
ones using expressions. It’s commonly used for data transformation tasks, such as
calculations, string formatting, or data type conversions.

Use Case: Add a new column that calculates the total price or formats dates in a
specific way.

Example:
Scenario: You have a dataset with columns Quantity and UnitPrice and want to create a
new column TotalPrice.
Objective: Multiply Quantity by UnitPrice to calculate TotalPrice.
Configuration:
a.Add a Derived Column transformation to the data flow.
b.Create a new column named TotalPrice.
c.Use the expression builder to define the logic: TotalPrice = Quantity * UnitPrice.
d.You can also modify existing columns. For example, format a column OrderDate to
MM-dd-yyyy using toString(OrderDate, 'MM-dd-yyyy').

Outcome: The data flow outputs the dataset with a new column, TotalPrice, containing
the computed values.

Benefit: Enhances the dataset by creating new columns or transforming existing ones,
enabling more flexible and insightful data analysis.
4. Join
The Join transformation combines data from two different streams based on a matching
key in specified columns. It supports various join types: inner, left, right, and full outer
joins.

Use Case: Combine sales data with product information to enrich the dataset for
analysis.

Example:
Scenario: You have two datasets—Sales with columns ProductID, Quantity, and
OrderDate, and Products with columns ProductID and ProductName.
Objective: Merge these datasets to include ProductName in the sales data using
ProductID.

Configuration:
Add a Join transformation to your data flow.
Select the join type (e.g., Inner Join).
Set the join condition as Sales@ProductID == Products@ProductID. Select the columns
to include in the output, such as ProductName, Quantity, and OrderDate.

Outcome: The result is a dataset that includes sales details along with the product
names for each order.

Benefit: Creates richer datasets by merging relevant information, making it easier to

analyze data in context, such as viewing product names alongside sales figures.
6. Conditional Split
The Conditional Split transformation divides the data stream into multiple streams based
on conditions. Each condition results in a separate output, similar to a SQL CASE
statement.

Use Case: Separate customers into "High Value" and "Low Value" groups based on
total spending.

Example:
Scenario: You have a dataset with a TotalSales column and want to split customers
into "VIP" and "Regular" based on their total sales. Objective: Create two streams—
one for customers with TotalSales > 5000 (VIP) and another for the rest (Regular).

Configuration:
a.Add a Conditional Split transformation to your data flow.
b.Define conditions for the split:
Condition 1: TotalSales > 5000 for "VIP" customers.
Condition 2 (Default): All other records go to "Regular."
c.Each condition creates a different output stream that you can further transform or
write to different sinks.

Outcome: The data flow splits the input dataset into two streams: one for "VIP"
customers and one for "Regular" customers.

Benefit: Enables dynamic data routing and processing based on specified conditions,
providing flexibility in handling multiple data processing scenarios within a single data
flow.
8. Union
The Union transformation combines data from multiple streams into a single stream. It is
similar to the SQL UNION ALL operation, where data from different sources or partitions are
appended together.

Use Case: Merge sales data from multiple regions into a consolidated dataset or
combine customer information from different sources.

Example:
Scenario: You have two datasets—Sales_Region1 and Sales_Region2— both with
columns OrderID, ProductID, and SalesAmount, and want to combine them into a
single stream.
Objective: Create a unified dataset containing sales data from both regions.

Configuration:
i.Add a Union transformation to your data flow.
ii.Connect the streams (Sales_Region1 and Sales_Region2) to the Union
transformation. iii.Ensure the columns from each stream align. If there are
mismatched columns, use a Select transformation beforehand to rename or
map them.

Outcome: The data flow outputs a single dataset combining records from both
Sales_Region1 and Sales_Region2.

Benefit: Provides a simple way to consolidate data from multiple sources, facilitating
further processing or analysis in a unified dataset.
9. Pivot
The Pivot transformation rotates rows into columns, allowing you to transform categorical
data into a more summarized format. This is useful when you want to create a cross-
tabular report or need data in a specific shape for analysis.

Use Case: Convert monthly sales data into a wide format with separate columns for each
month.

Example:

Scenario: You have a dataset with columns Year, Month, and SalesAmount and want to
pivot the data so that each month becomes its own column. Objective: Create a table with
Year as rows and separate columns for each month (Jan, Feb, Mar, etc.).

Configuration:

Add a Pivot transformation to your data flow.

Select the Group By column, such as Year.
Set the Pivot Key to Month and Aggregate as sum(SalesAmount).
Optionally, specify the column names for the pivoted data (e.g., Jan, Feb, Mar).

Outcome: The data flow outputs a dataset where each year has columns for each month’s
total sales (e.g., Jan, Feb, Mar).

Benefit: Converts long-format data into a wide format, making it easier to generate
reports and perform comparisons across categories.
10. Unpivot
The Unpivot transformation performs the reverse of a pivot, transforming columns into
rows. This is helpful when you need to normalize wide datasets into a more row-based,
analyzable format.

Use Case: Convert a wide table with separate columns for each month into a long
format with Month and SalesAmount columns.

Example:
Scenario: You have a dataset with columns Year, Jan_Sales, Feb_Sales, and
Mar_Sales, and you want to unpivot these monthly columns into rows.
Objective: Create a table with columns Year, Month, and SalesAmount.

Configuration:
i.Add an Unpivot transformation to your data flow.
ii.Select the columns to unpivot (e.g., Jan_Sales, Feb_Sales, Mar_Sales).
iii.Define the Unpivot Key (e.g., Month) to indicate the new column names and
the Unpivoted Column (e.g., SalesAmount) to hold the values.

Outcome: The data flow outputs a dataset with columns Year, Month, and
SalesAmount, where each month’s sales are now individual rows.

Benefit: Normalizes wide datasets into a long format, making it easier to filter,
aggregate, and analyze the data based on different categories.

11. Alter Rows

The Alter Row transformation is used to define row-level policies for data operations like
insert, update, delete, or upsert when writing to a database sink. It allows you to control
how each row is processed in the destination.
Use Case: Perform conditional updates, inserts, or deletes in a database based on
specific criteria, such as updating customer information where Status = 'Active'.

Example:
Scenario: You have a dataset of customer records with a Status column. You want
to update rows where Status is "Updated" and delete rows where Status is
"Inactive".
Objective: Define row policies for update and delete operations.

Configuration:
i.Add an Alter Row transformation to your data flow.
ii.Use the expression builder to set conditions:
isUpdate() for rows where Status == 'Updated'.
isDelete() for rows where Status == 'Inactive'. iii.Select the target policy
(Insert, Update, Delete, Upsert) based on the condition.

Outcome: The data flow marks rows for updates or deletions based on the
specified conditions, which will be executed when writing to the sink.

Benefit: Provides fine-grained control over row-level data operations in sinks,

facilitating effective data management and synchronization.

12. Lookup
The Lookup transformation searches for matching records in a secondary dataset (reference
table) and retrieves additional information. It’s commonly used to enrich data streams by
adding details from a reference dataset.

Use Case: Enrich sales data with product details from a product catalog based on
ProductID.
Example:
Scenario: You have a sales dataset with ProductID and want to add the
ProductName and Category from a product reference table.
Objective: Look up additional product details to enrich the sales data.

Configuration:
i.Add a Lookup transformation to your data flow. ii.Select the lookup dataset
(e.g., Products table) and define the join
condition, such as Sales@ProductID == Products@ProductID.
iii.Choose the columns to retrieve from the lookup dataset (e.g., ProductName,
Category).

Outcome: The data flow outputs the original sales dataset, now enriched with
ProductName and Category columns from the lookup table.

13. Flatten
The Flatten transformation converts complex hierarchical data structures like arrays or
nested JSON objects into a flat, tabular format. This is essential for processing data from
sources that have nested or repeated fields.

Use Case: Extract individual items from an array in JSON data, such as getting order
items from an order record.

Example:
Scenario: You have a dataset with an Orders column containing nested arrays of
OrderItems (e.g., each Order has multiple OrderItems with ItemName and Quantity).
Benefit: Enhances datasets by pulling in additional details from reference tables,
reducing the need for complex joins or nested queries in downstream processes.

Objective: Flatten the OrderItems array to create individual rows for each item in every
order.

Configuration:
Add a Flatten transformation to your data flow.
Select the array column to flatten (e.g., Orders.OrderItems).
Define the new columns for the flattened data (e.g., ItemName, Quantity).

Outcome: The data flow outputs a flat table where each row represents an individual
OrderItem with columns ItemName and Quantity.
databases.
14. Cast Transformation
The Cast transformation changes the data type of columns in the data flow. It is commonly
used to ensure that columns have the correct data type for downstream processing or to
meet the schema requirements of a destination.

Use Case: Convert a string date to a date type or a numeric column from a string to a
decimal for calculations.

Example:
Scenario: You have a dataset with columns Price as a string and OrderDate as a
string. You want to cast Price to a decimal and OrderDate to a date format.
Objective: Ensure Price and OrderDate have the correct data types for
calculations and filtering.

Configuration:
i.Add a Cast transformation to your data flow.
Benefit: Makes it easier to analyze and process hierarchical data by transforming it into
a simple, flat structure suitable for reporting, aggregation, or loading into relational
ii.Select the columns to cast:
For Price, use the expression toDecimal(Price).
For OrderDate, use the expression toDate(OrderDate, 'yyyyMM-dd').
iii.Apply these expressions to change the data types accordingly.

Outcome: The data flow outputs a dataset with Price as a decimal and OrderDate
as a date, ready for further transformations or loading into a target sink.

15. Surrogate Key Transformation

The Surrogate Key transformation generates a unique, sequential identifier for each row in
the dataset. It is commonly used when there is no natural key in the source data and you
need a primary key in the destination.

Use Case: Add a unique identifier to each customer record in a table for loading into a
database that requires a primary key.

Example:
Scenario: You have a dataset with customer records but no unique identifiers. You
want to add a CustomerID to each row.
Objective: Generate a new column called CustomerID that provides a unique,
sequential number for each row.

Configuration:
i.Add a Surrogate Key transformation to your data flow.
ii.Name the new column (e.g., CustomerID).
iii.Set the starting value (default is 1) and specify the increment if needed.
Benefit: Ensures data type compatibility, allowing for accurate calculations, filtering,
and proper schema alignment in data sinks.
Outcome: The data flow outputs the dataset with an additional CustomerID
column, containing unique sequential numbers for each row.

Benefit: Provides a simple way to generate unique identifiers for datasets, useful for
loading into databases or for deduplication tasks.
16. Select Transformation
The Select transformation allows you to choose, rename, or drop columns in your dataset. It
helps shape the data by controlling which columns are passed to subsequent
transformations or sinks.

Use Case: Remove unnecessary columns or rename columns to align with a target
schema.

Example:
Scenario: You have a dataset with columns FirstName, LastName, Email, and
PhoneNumber, but only need FullName and Email for the next step.
Objective: Combine FirstName and LastName into FullName, drop PhoneNumber,
and pass only the necessary columns.

Configuration:
i.Add a Select transformation to your data flow.
ii.Rename columns as needed (e.g., combine FirstName and LastName
using an expression to create FullName). iii.Uncheck columns you wish to remove
(e.g., PhoneNumber).

Outcome: The data flow outputs a dataset with only FullName and Email.

Benefit: Simplifies data processing by controlling column selection, renaming columns

for clarity, and ensuring the output schema meets the requirements of downstream
transformations or sinks.
17. Sink Transformation
The Sink transformation is the endpoint in a data flow where transformed data is written to a target
destination. It supports various sinks like Azure Blob Storage, Azure SQL Database, Data Lake, Cosmos DB,
etc. This is where data is stored after all processing is completed.

Use Case: Write processed data to a SQL database, store transformed data in a CSV file in Blob
Storage, or load data into a Data Lake.

Example:
Scenario: You have processed a dataset and want to save the transformed data into an Azure SQL
Database table named ProcessedOrders.
Objective: Write the data to the target SQL table, with options for insert, update, or upsert
operations.

Configuration:
i.Add a Sink transformation to your data flow. ii.Select the
target dataset (e.g., an Azure SQL Database table).
iii.Choose the Write Behavior:
Insert: Insert all rows into the table.
Update: Update existing rows based on a key.
Upsert: Insert new rows and update existing ones.
Delete: Remove rows based on conditions.
iv.Map columns from the data flow to the sink's schema using Mapping.
v.(Optional) Enable additional settings like partitioning and batch size for optimized
performance.

Outcome: The data flow writes the transformed dataset into the specified table, following the
defined write behavior.

Benefit: Provides flexibility in writing data to various storage destinations, supporting different write
operations (insert, update, upsert, delete) to match your specific data integration needs.
**Summary of Data Flows in Azure Data Factory**

Data flows in Azure Data Factory provide a visual, no-code

environment for designing and executing complex data
transformations. They integrate seamlessly with Azure services,
offering various built-in transformations like joins, aggregates, and
pivots. While data flows are scalable and support schema drift, they
may have a learning curve and can become costly for large data
volumes. They are best suited for batch processing, data
standardization, and ETL scenarios within the Azure ecosystem,
providing a flexible and powerful tool for managing data pipelines.

ADF Data Flow Cheat Sheet
No ratings yet
ADF Data Flow Cheat Sheet
9 pages
Mapping Data Flow 2344
No ratings yet
Mapping Data Flow 2344
8 pages
SAP Datasphere Data Builder
No ratings yet
SAP Datasphere Data Builder
22 pages
SSIS Transformation Guide
No ratings yet
SSIS Transformation Guide
6 pages
Transformations of Mapping Data Flow
No ratings yet
Transformations of Mapping Data Flow
2 pages
Transformation Description Examples of When Transformation Would Be Used
No ratings yet
Transformation Description Examples of When Transformation Would Be Used
7 pages
Filter Transformation Overview
No ratings yet
Filter Transformation Overview
4 pages
Data Transformation
100% (2)
Data Transformation
26 pages
Transformation Description Examples of When Transformation Would Be Used
No ratings yet
Transformation Description Examples of When Transformation Would Be Used
5 pages
SSIS Transformations
No ratings yet
SSIS Transformations
46 pages
Azure Data Engineering Guide
No ratings yet
Azure Data Engineering Guide
11 pages
ADFDF Cheat Sheet Sqlplayer v2.1
100% (1)
ADFDF Cheat Sheet Sqlplayer v2.1
2 pages
Alteryx Workflow Tools Overview
No ratings yet
Alteryx Workflow Tools Overview
16 pages
Introduction of Different Informatica Transformation
No ratings yet
Introduction of Different Informatica Transformation
5 pages
Data Analytics SQL Power Bi
No ratings yet
Data Analytics SQL Power Bi
23 pages
Azure Data Factory Mapping Data Flows
No ratings yet
Azure Data Factory Mapping Data Flows
22 pages
Transformation
No ratings yet
Transformation
2 pages
SSIS Data Transformations Overview
No ratings yet
SSIS Data Transformations Overview
5 pages
SAP BODS Transforms: List of Available Transforms
No ratings yet
SAP BODS Transforms: List of Available Transforms
4 pages
2 - Transformations
No ratings yet
2 - Transformations
18 pages
Azure Data Factory Interview Questions & Answers - Claude
No ratings yet
Azure Data Factory Interview Questions & Answers - Claude
25 pages
Pivot and Unpivot On SSIS
100% (2)
Pivot and Unpivot On SSIS
16 pages
SESSION-5 Targets and Filter Transformation
No ratings yet
SESSION-5 Targets and Filter Transformation
21 pages
Alteryx Designer Tool Sheet 11.0 PDF
0% (1)
Alteryx Designer Tool Sheet 11.0 PDF
24 pages
SQL Data Management Techniques Guide
No ratings yet
SQL Data Management Techniques Guide
28 pages
Ravi Samal Resume
No ratings yet
Ravi Samal Resume
2 pages
Einstein Analytics 2 Min Guide
No ratings yet
Einstein Analytics 2 Min Guide
4 pages
Singh Advanced Data Cleaning Techniques For E-Commerce Projects
No ratings yet
Singh Advanced Data Cleaning Techniques For E-Commerce Projects
14 pages
Ssis Final Cheat Sheet
No ratings yet
Ssis Final Cheat Sheet
1 page
Data Cleaning and Transformation Techniques
No ratings yet
Data Cleaning and Transformation Techniques
13 pages
Data Flow Implementation in SSIS 2008
No ratings yet
Data Flow Implementation in SSIS 2008
90 pages
Transformation
No ratings yet
Transformation
28 pages
Transformation Excel
No ratings yet
Transformation Excel
8 pages
Filter
No ratings yet
Filter
5 pages
Understanding the ETL Process Steps
No ratings yet
Understanding the ETL Process Steps
30 pages
Kishan
No ratings yet
Kishan
7 pages
Azure Data Factory Data Flows: Luke Newport Technical Specialist - Data & AI
100% (1)
Azure Data Factory Data Flows: Luke Newport Technical Specialist - Data & AI
30 pages
Alteryx Tools Sheet v11.3
100% (3)
Alteryx Tools Sheet v11.3
24 pages
Business Analysis & Data Solutions
No ratings yet
Business Analysis & Data Solutions
4 pages
Data Flows
No ratings yet
Data Flows
17 pages
Lecture 5 - Data Transformation
No ratings yet
Lecture 5 - Data Transformation
7 pages
DWDM PDF
No ratings yet
DWDM PDF
21 pages
Transformations
No ratings yet
Transformations
7 pages
Techniques Used To Transform Data, Part 2
No ratings yet
Techniques Used To Transform Data, Part 2
7 pages
Data Transformation Techniques Overview
100% (1)
Data Transformation Techniques Overview
8 pages
Presented By: - Preeti Kudva (106887833) - Kinjal Khandhar (106878039)
No ratings yet
Presented By: - Preeti Kudva (106887833) - Kinjal Khandhar (106878039)
72 pages
Business Data Analytics Part 3
No ratings yet
Business Data Analytics Part 3
59 pages
DP700 Exam Prep Study Notes
No ratings yet
DP700 Exam Prep Study Notes
5 pages
Basic Informatica PowerCenter Case Study
100% (1)
Basic Informatica PowerCenter Case Study
28 pages
Senior BI Developer Profile and Expertise
No ratings yet
Senior BI Developer Profile and Expertise
4 pages
Data Transformation Challenges
No ratings yet
Data Transformation Challenges
22 pages
Power Bi
No ratings yet
Power Bi
95 pages
Data Fundamentals
No ratings yet
Data Fundamentals
21 pages
Oracle Cloud Infrastructure Guide
No ratings yet
Oracle Cloud Infrastructure Guide
9 pages
ITSGHD190209 - Open Text Archive Server Installation With SQL Database
No ratings yet
ITSGHD190209 - Open Text Archive Server Installation With SQL Database
27 pages
Content Platform Architecture Fundamentals Whitepaper
100% (1)
Content Platform Architecture Fundamentals Whitepaper
34 pages
Storage Solutions for IT Pros
No ratings yet
Storage Solutions for IT Pros
3 pages
V18684 - Practice Test 1
No ratings yet
V18684 - Practice Test 1
25 pages
Azure ADF
No ratings yet
Azure ADF
22 pages
Azure DevOps Engineer Interview Q&A
No ratings yet
Azure DevOps Engineer Interview Q&A
19 pages
Daksh Pratap Singh - CV
No ratings yet
Daksh Pratap Singh - CV
3 pages
Azure Storage Account
No ratings yet
Azure Storage Account
17 pages
300 Practice Questions and Answers Covering Key Cloud Concepts - 1
No ratings yet
300 Practice Questions and Answers Covering Key Cloud Concepts - 1
83 pages
IT227 - Unit4 - Discussion
No ratings yet
IT227 - Unit4 - Discussion
3 pages
Oracle 1Z0-1067 Exam Q&A Guide
No ratings yet
Oracle 1Z0-1067 Exam Q&A Guide
28 pages
3 Designing Pastebin
No ratings yet
3 Designing Pastebin
9 pages
README
No ratings yet
README
369 pages
AZ-204T00 Developing Solutions For Microsoft Azure
No ratings yet
AZ-204T00 Developing Solutions For Microsoft Azure
9 pages
Efficient BLOB Management in DBMS
No ratings yet
Efficient BLOB Management in DBMS
15 pages
WP Dremio Definitive Guide To The Data Lakehouse
No ratings yet
WP Dremio Definitive Guide To The Data Lakehouse
20 pages
Cloud Computing Unit III Full
No ratings yet
Cloud Computing Unit III Full
195 pages
Walrus
No ratings yet
Walrus
31 pages
Azure AZ305 Study Plan
No ratings yet
Azure AZ305 Study Plan
6 pages
Cloud Platforms for Industry Applications
No ratings yet
Cloud Platforms for Industry Applications
18 pages
AWS Vs Azure Vs Google Vs IBM Vs Oracle Vs Alibaba - A Detailed Comparison and Mapping Between Various Cloud Services
67% (3)
AWS Vs Azure Vs Google Vs IBM Vs Oracle Vs Alibaba - A Detailed Comparison and Mapping Between Various Cloud Services
21 pages
Azure Fundamentals & Interview Questions
No ratings yet
Azure Fundamentals & Interview Questions
18 pages
Azure Databricks Notes
No ratings yet
Azure Databricks Notes
20 pages
OCI Security and Networking Guide
No ratings yet
OCI Security and Networking Guide
20 pages
HPE Reference Configuration For HPE Cloud Bank Storage With Microsoft Azure-A00043317enw
No ratings yet
HPE Reference Configuration For HPE Cloud Bank Storage With Microsoft Azure-A00043317enw
24 pages
Assignment4 Serverless
No ratings yet
Assignment4 Serverless
8 pages
Introduction To Ibm Filenet Content Manager
No ratings yet
Introduction To Ibm Filenet Content Manager
32 pages
In The Case of Request
No ratings yet
In The Case of Request
23 pages
VMware Cloud Director Object Storage 2.2.3 Guide
No ratings yet
VMware Cloud Director Object Storage 2.2.3 Guide
45 pages

ADF - Dataflow Mapping

Uploaded by

ADF - Dataflow Mapping

Uploaded by

Mapping DataFlows

Your Mapping Data Flows in Azure Data Factory

Here’s a detailed overview of the common

2. Derived Column Transformation

Benefit: Creates richer datasets by merging relevant information, making it easier to

Add a Pivot transformation to your data flow.

11. Alter Rows

Benefit: Provides fine-grained control over row-level data operations in sinks,

15. Surrogate Key Transformation

Benefit: Simplifies data processing by controlling column selection, renaming columns

Data flows in Azure Data Factory provide a visual, no-code

You might also like