0% found this document useful (0 votes)
66 views17 pages

ADF - Intro and Components

Azure Data Factory (ADF) is a cloud-based ETL and data integration service that enables the creation of data-driven workflows for data movement and transformation. It consists of key components such as pipelines, activities, datasets, linked services, triggers, and integration runtimes, allowing users to orchestrate complex data processes. ADF is fully managed by Microsoft, scalable, and supports various programming languages and data stores for seamless data integration.

Uploaded by

gurucomp8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views17 pages

ADF - Intro and Components

Azure Data Factory (ADF) is a cloud-based ETL and data integration service that enables the creation of data-driven workflows for data movement and transformation. It consists of key components such as pipelines, activities, datasets, linked services, triggers, and integration runtimes, allowing users to orchestrate complex data processes. ADF is fully managed by Microsoft, scalable, and supports various programming languages and data stores for seamless data integration.

Uploaded by

gurucomp8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

What is azure data


factory
Azure Data Factory (ADF) is the cloud based ETL and data integration service that allows to you create
data-driven workflows for orchestrating data movement and transforming data at scale. Using azure
data factory you can create and schedule data driven workflows ( called pipelines) that can ingest data
from disparate data stores

• It is fully managed by Microsoft with ability to scale compute as per requirement and a data integration
solution that is completely cloud based and require no on-premise servers

• You can build complex ETL processes than transform your data visually with data flow or by using
services like Azure HDInsight Hadoop, Azure Databricks ,Azure SQL server etc.

2
ETL – Extract Transform Load
ELT – Extract Load Transform

2
• Extract : During the extraction process, data engineers define the data and its source:
• Define the data source: Identify source details such as the resource group, subscription, and identity
information such as a key or secret.
• Define the data: Identify the data to be extracted. Define data by using a database query, a set of files, or
an Azure Blob storage name for blob storage.

• Transform
• Define the data transformation: Data transformation operations can include splitting, combining,
deriving, adding, removing, or pivoting columns. Map fields between the data source and the data
destination. You might also need to aggregate or merge data.

• Load
• Define the destination: During a load, many Azure destinations can accept data formatted as a JavaScript
Object Notation (JSON), file, or blob. You might need to write code to interact with application APIs.
• Azure Data Factory offers built-in support for Azure Functions. You'll also find support for many
programming languages, including Node.js, .NET, Python, and Java. Although Extensible Markup
Language (XML) was common in the past, most systems have migrated to JSON because of its flexibility
as a semi structured data type.
• Start the job: Test the ETL job in a development or test environment. Then migrate the job to a
production environment to load the production system.
How to create an
adf resource
Step 1:
2.

3. Choose adf name as per naming standards, region and proceed to next. Ensure configurations and then
‘Review & create’
Azure Data Factory Components
An Azure subscription might have one or more Azure Data Factory instances (or data factories). Azure Data
Factory is composed of the following key components:

1. Pipelines
2. Activities
3. Datasets
4. Linked services
5. Triggers
6. Data Flows
7. Integration Runtimes

These components work together to provide the platform on which you can compose data-driven
workflows with steps to move and transform data.
Pipelines
• Pipelines are logical grouping of activities that perform a unit of work. Together the activities in the pipeline
carry out a task

• They can be scheduled, parameterised and monitored, allowing for efficient and automated executions of
data processes across various sources and destinations

• Example – a pipeline can contain an activity that ingests data from on prem SQL database and another activity
that runs query on data bricks to transform the data

• benefit - the pipeline allows you to manage the activities as a set instead of managing each one individually.

• The activities in a pipeline can be chained together to operate sequentially, or they can operate
independently in parallel.
Activity
• Activities represent a processing step in a pipeline. For example, you might use a copy activity to copy data
from one data store to another data store.

• Data Factory supports three types of activities: data movement activities, data transformation activities, and
control activities.

• Data Movement: Transfer data between sources and destinations.

• Data Transformation: Process and transform data (e.g., using Data Flow or Databricks).

• Control: Manage execution and flow of activities.


Datasets and Linked Services
• Datasets represent data structures within the data stores, which simply point to or reference the data you
want to use in your activities as inputs or outputs.

• Linked services are much like connection strings, which define the connection information that's needed for
Data Factory to connect to external resources

• Example - an Azure Storage-linked service specifies a connection string to connect to the Azure Storage
account. Additionally, an Azure blob dataset specifies the blob container and the folder that contains the data.

• Linked services are used for two purposes in Data Factory:

• To represent a data store that includes, but isn't limited to, a SQL Server database, Oracle database, file
share, or Azure blob storage account. For a list of supported data stores, see the copy activity article.

• To represent a compute resource that can host the execution of an activity. For example, the HDInsight
Hive activity runs on an HDInsight Hadoop cluster. For a list of transformation activities and supported
compute environments, see the transform data article.
Integration runtime
• In Data Factory, an activity defines the action to be performed. A linked service defines a target data store or a
compute service. An integration runtime provides the bridge between the activity and linked Services. It's
referenced by the linked service or activity, and provides the compute environment where the activity either
runs on or gets dispatched from. This way, the activity can be performed in the region closest possible to the
target data store or compute service in the most performant way while meeting security and compliance
needs.
Triggers
•Automate the execution of pipelines based on specific conditions or events.

•Schedule Trigger: Run pipelines at specific times or intervals.


Event-Based Trigger: Activate pipelines based on Azure events (e.g., file uploads).
Tumbling Window Trigger: Process data in fixed time windows.

•Key Parameters: Define Start Time, End Time, Recurrence, and Window Size.

•Trigger Execution Flow:Triggers initiate pipelines automatically when conditions are met.

•Use Cases: Automate ETL tasks, schedule data loads, and trigger actions based on Azure events.
Overview

• Pipelines:
• Definition: Collections of activities grouped to perform a specific task.
• Purpose: Streamline the management, deployment, and scheduling of related activities.
• Example: Pipeline for ingesting, cleaning, and analysing log data.

• Activities:
• Function: Perform specific actions on data within pipelines.
• Types
• Data Movement: Transfer data between sources and destinations.
• Data Transformation: Process and transform data (e.g., using Data Flow or Databricks).
• Control: Manage execution and flow of activities.

• Datasets : Input and output data used by activities.


References

• Introduction to Azure Data Factory - Azure Data Factory | Microsoft Learn

• Create a Linked service:


https://learn.microsoft.com/en-us/training/modules/data-integra
tion-azure-data-factory/8-create-linked-services

• Create a Dataset in ADF:


https://learn.microsoft.com/en-us/azure/data-factory/concepts-d
atasets-linked-services?tabs=data-factory

You might also like