Azure Data Engineering
ADF Data Flow
Cheat Sheet
Part 1 - The Basics
1. Filter Transformation
- What It Does: Filters rows in your dataset based on
conditions you define.
- Use Case: Remove rows with null values or filter data
based on date ranges or specific conditions (e.g., only
select customers from a certain country).
- Pro Tip: Combine multiple conditions using AND/OR
logic to create complex filters.
2. Select Transformation
- What It Does: Chooses which columns to include in the
data flow output.
- Use Case: Streamline datasets by only keeping the
necessary columns for downstream activities.
- Pro Tip: You can also rename columns here, making it
easier to handle field mappings.
3. Aggregate Transformation-
- What It Does: Performs aggregation operations like sum,
average, min, max, count, and more on grouped data.
- Use Case: Summarize sales data by region or calculate
average transaction values.
- Pro Tip: Use group-by functions to calculate aggregated
values for different data categories.
Part 2 - Advanced
Techniques
1. Join Transformation
- What It Does: Combines two datasets based on a
common key, allowing you to merge records.
- Use Case: Join sales data with customer details using a
unique customer ID.
- Pro Tip: ADF supports various types of joins (inner, left,
right, full) to ensure you get the desired output.
2. Derived Column Transformation
- What It Does: Creates new columns by applying
expressions to existing data fields.
- Use Case: Generate calculated fields like “Total Price”
from individual unit prices and quantities.
- Pro Tip: Use ADF’s built-in functions like substring, date,
or math functions to create dynamic columns.
3. Sort Transformation
- What It Does: Sorts data by one or more columns, either
ascending or descending.
- Use Case: Organize your data for reports, or for
downstream systems that require sorted input.
- Pro Tip: Sorting can be memory-intensive, so only sort
when necessary and on relevant columns.
Part 3 - Specialized
Transformations
1. Union Transformation
- What It Does: Combines multiple datasets into a single
dataset by appending rows.
- Use Case: Union sales data from different regions or
months into one master dataset.
- Pro Tip: Ensure the column schemas of the datasets
match or use “Select” transformations to align them.
2. Conditional Split Transformation
- What It Does: Divides your dataset into multiple streams
based on conditions (think of it as an "if-else" for data).
- Use Case: Separate valid and invalid records or route
data to different processes based on status.
- Pro Tip: Use multiple conditions to split data in more
granular ways, improving the flexibility of your data
processing.
3. Lookup Transformation
- What It Does: Looks up values from another dataset and
adds them as new columns in your current dataset.
- Use Case: Enrich customer data by looking up additional
information like loyalty status or demographics.
- Pro Tip: Always ensure that the dataset being looked up
has unique keys for accurate results.
Part 4 - Advanced Control
1. Alter Row Transformation
- What It Does: Controls row-level operations like
inserting, updating, deleting, or skipping rows in sink
operations.
- Use Case: Use this transformation to perform UPSERT
operations into databases.
- Pro Tip: Combine with conditions to dynamically decide
how each row should be processed
(insert/update/delete).
2. Exists Transformation
- What It Does: Determines whether rows from one
stream exist in another stream, returning only rows that
match or don’t match.
- Use Case: Filter records that exist in one dataset but not
another (e.g., records that exist in source data but not in
destination).
- Pro Tip: Use for deduplication or filter out already
processed records from new data.
3. Pivot Transformation
- What It Does: Converts rows into columns, essentially
rotating your data for better analytics and reporting.
- Use Case: Create cross-tabular reports (e.g., sales per
product, per region).
- Pro Tip: Use aggregate functions in combination with
Pivot to summarize data effectively.