Data Integration
using SSIS
Lecture Two
Ejada Internal Use Only By Eng. Ahmed Abdelhakim
Agenda
1) Lecture Objectives
2) Data Integration
3) Extract-Transform-Load
4) What is SSDT
5) SSIS Overview
6) SSIS Breakdown
7) Best Practices & Focus Points
8) Summary & Resources
2
Ejada Internal Use Only
Lecture Objectives
By the end of the lecture, you will be able to:
✓ Understand What is Data Integration, ETL, SSDT, and SSIS
✓ Identify the core components of SSIS, Tool Capability and Features
✓ Get Familiar with Integrating The Data From Source to Destination
✓ Learn Some Good Practices on doing ETL
3
Ejada Internal Use Only
Data Integration
Data integration is the process of combining data from various sources into a unified
and coherent format. This gives you a simplified view of data, which can be used by
downstream applications for analysis, reporting, and decision-making. Data
integration can be accomplished by implementing various methods.
*Common data integration techniques include ELT and ETL.
4
Ejada Internal Use Only
Extract-Transform-Load
Extract, transform, and load (ETL) is the process of combining required data from
multiple sources, apply transformations (Cleaning, Formatting, etc.) and then load the
result into a large, central repository called a data warehouse.
5
Ejada Internal Use Only
What is SSDT
Microsoft’s SQL Server Data Tools (SSDT) is a Visual Studio development solution for
Business Intelligence Projects, Covering all the BI Workflow.
Data Tools available in the SSDT are:
❖ Integration Services (SSIS) for Data Integration
❖ Analysis Services (SSAS) for Analysis Layer
❖ Reporting Services (SSRS) for Paginated reports
Ejada Internal Use Only
SSIS Overview
Microsoft SQL Server Integration Services (SSIS) is a platform for building high-
performance data integration solutions, including extraction, transformation, and load
(ETL) packages for data warehousing, Some of the key points are:
SSIS Key Features are:
❖ Ease of use Tool with Drag and Drop GUI
❖ Fast and Performant Tool
❖ Supports Various Data Sources
❖ Supports Error and Event Handling
❖ Can Easily Handle Different Load Modes and Slowly Changing Dimensions
Ejada Internal Use Only
SSIS Project Elements
• SSIS Packages:
the collection of tasks executed in an orderly fashion needed to merge data into a single dataset and load the
destination table in a single step. An SSIS package can use control flow, manager, tasks, variables, event
handlers, parameters, and more to achieve this.
• Connection Managers:
Used to set up a link between SSIS and an external data source. SSIS includes several distinct types of
connection managers
• Project Parameters:
Allow you to assign values to properties for packages at the time of package execution.
Ejada Internal Use Only
SSIS Package Components
Control Flow Data Flow
The Main flow, helps you arrange tasks Encapsulates the ETL Operations that moves
like data flow, SQL execution, containers, data between sources and destinations with
etc. for easier execution. transform, clean, and modify data as it is moved
Event Handlers
Ability of handling events in the run-time
such as before/pre-executing the task or
container, post-execution, error, etc.
❖ Parameters and Connection Managers are also available in Package Scope (Cannot be modified outside)
Ejada Internal Use Only
Some Control Flow Tasks
➢ Data Flow Task:
Create a Dataflow to Encapsulated Sequence of ETL Tasks that covers data journey from
source extraction then some transformations to destination insertion.
➢ Execute SQL Script:
Used To Execute SQL Statements within available data repository
➢ Execute Package:
Reference any available package to be executed.
➢ Container:
Logically Encapsulate Related Tasks and used to control the precedence of execution,
has three types (Sequence, For loop, and Foreach loop).
➢ Expression Task:
Simply an If-Else Conditional Split for different flows.
More Tasks:
- Analysis Services Processing Task: used to process tables in semantic layer
- Bulk Insert Task: used to insert enormous data volume
Ejada Internal Use Only
Some Data Flow Tasks
➢ Source/Destination Connect:
Connect to Available Data Repository in Connection Manager or other Types Like
Raw,Excel,and Flat Files
➢ Derived Column:
Adding a new column in the data, can be derived from existing columns or from
metadata of the execution like current timestamp.
➢ Data Conversion:
Convert Types and Names of Source Column.
➢ Merge Join:
Like inner and Outer Join in SQL that join different tables.
➢ Merge and Union All:
Combines rows from different sources into one output.
➢ Conditional Split:
Split The Source into many branches Based on Given Criteria.
More Tasks:
- Slowly Changing Dimension: handle dimensions that changes frequently
Ejada Internal Use Only
Best Practices
❖ Get rid of unused columns when fetching the data
❖ Don’t Pass no longer wanted data as soon as possible
❖ Prioritize Performance, identify bottlenecks and solve it if possible
❖ Prioritize Readability for better Collaboration and Debugging
❖ Logically Separation using Containers and Packages
❖ Make the work as flexible as possible and avoid hard-coded values
Ejada Internal Use Only
Focus Points
❖ Initial and Incremental Load
❖ Merge Join Types, Lookup, and Rowcount Tasks
❖ Parameters and Variables
❖ Error and Event Handling
❖ Redirect Rows for further processing
❖ SSIS Destination Loading Modes
❖ Deployment and Scheduling
Ejada Internal Use Only
Summary & Resources
❑ ETL is a Process occurring in many places i.e., Data Integration with the goal of Migrating
Data From Source/s To Destination
❑ SSDT is a Visual Studio Extension for BI Solutions including SSIS, SSAS, SSRS
❑ SSIS is High-Performance Drag and Drop Data Integrator and ETL Tool
Topic Resources
[[ 4 HOURS ]] SSIS Complete Tutorial - { End to End } Full Course
SSIS Tutorials SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS)
Free ETL Tutorial - Learn ETL using SSIS | Udemy
Ejada Internal Use Only
Thank You
Ejada Systems Company Limited شركة إجادة للنظم المحدودة
www.ejada.com
[email protected] www.ejada.com | [email protected]
Ejada Internal Use Only