Informatica 9.
1 for Agile Data Integration
Best Practices Workshop
LEAN
1.
2.
3.
4.
5.
6.
7.
Eliminate waste
Automate processes
Empower the team
Continuously improve
Build quality in
Plan for change
Optimize the whole
AGILE
1.
2.
3.
4.
5.
6.
7.
Maximize business value
Empirical process control
Facilitate collaboration
Remove impediments
Frequent assessment
Adaptive response
Balance design & production
Agile Data Integration
Find data
sources and
targets
Profile data
sources for data
quality
Identify join
conditions and
data quality rules
Analyst defines
mapping
specification
Estimate project
scope based on
impact analysis
Build, Test, and
Deploy
Cut project throughput times by 90%
Double productivity
Project
Request
Define
Requirements
Analyze &
Design
Build
Test
Deploy
Support &
Maintain
What Percentage of Projects Make it
Through the First Time?
Value Stream Map
DOIT Corporation: Value Stream Map (AS-IS) for Change Request Process
Monday, December 20, 2010
Data Warehouse
Team
GMNA Applications
Team
(Irina)
Change Request
Confirmation Request
Telephone Tag
Automated
Workflow/Tracking
(Cust satisfaction)
Status Request
Status Update
Status Request
Status Update
CR Review Committee
Data Dictionary
to clarify rqmnts
(13 days)
Semi-Weekly Review
Notify Customer
(Cust Satisfaction)
(5 days)
Status Request
Clarify Requirements
Status Update
Requirements Clarification
Approved
Changes
Add CR
To List
Assign Resource
Design Approval
Integration Team Manager
Architecture Review Council
Bypass Council
for simple CRs
(26 days)
Production CR
Submission
Daily ETL Batch Run
Data Warehouse Team
Bypass
Committee for
simple changes
(8 days)
CR Approval
Test Team Manager
Design
Document
Test Results
Distribution
Approved
Designs
Forward
CR Request
To Developer
CRs
P1x12
P2x35
P3x124
Test Scheduling
Approved CR
& Design Docs
Requirements
Review
Design &
Development
Development Team
Development Team
Testing Handoff
Design Docs
& Schedule
Automated
Regression
Testing
Test Execution
Test Team
Test Team
Change Management Board
CR Approval
& Schedule
(21 days - requires
investment)
Test Case
Development
Development Team
Charge
Request
Production
Deployment
Production
Execution
Infrastructure Team
Automatic Daily ETL Batch
Run
8.8 Days
13.3 Days
30 Minutes
26 Days
180 Minutes
1 Day
15 Minutes
12.8 Days
180 Minutes
8.5 Days
90 Minutes
0.3 Days
15 Minutes
Lead Time = 75.6 Days
Work Time = 510 Minutes
(8.5 hrs or 0.35 Days)
Value Ratio: Work Time / Lead Time = 0.5%
Notes:
(1) Lead Time includes 5 delay in customer notification
(2) Lead Time could be reduced to 24 days with just process changes and using existing tools
(3) Lead Time could be reduced to 3 days with a capital investment for automated testing
Agile Data Integration
Best Practices
As-Is Process
Days/Weeks
Mins/
Hrs
Days/Weeks
Mins/
Hrs
Days/Weeks
Mins/
Hrs
Days/Weeks
Mins/
Hrs
Mins/
Hrs
1. Business & IT collaborate efficiently to fulfill requirements
2. Rapidly prototype and validate specifications & business rules
3. Leverage business glossary & metadata for search & impact analysis
4. Use data profiling continuously throughout design & development
5. Build data quality into the data integration process
6. Insulate applications from change through data virtualization
7. Minimize delivery time through flexible deployment options
8. Leverage test automation to increase coverage & reduce errors
To-Be Process
Hours
Mins
Hours
Mins
Mins
How many of you use agile development
methodologies?
Use Agile today
Plan to use Agile within the next 12 months
No plans to adopt Agile in the near future
Business Request Scenario
Fulfill Requests in Days Instead of Weeks/Months
Agile Data Integration is ideally suited
for Data Warehouse & BI projects
because of the frequency of change
1.
Business requests new
information be added to
an existing BI report
2.
Analyst works with Developer
to add a new data source to a
data warehouse fact table
Profile data
sources for data
quality
Assign Business
Request
Business submits
request for new
information in report
Analyst receives
request from
business
Mins/Hrs
data quality rules
Identify Data Source
Grant Data Access
App Developer
identifies table to meet
requirements
DBA grants analyst
access to data source
(e.g. table)
Analyst
requests
clarification
from business
Establish a data
governance framework
to ensure confidence,
integrity, transparency,
& security
Days/Weeks
Analyst defines
mapping
specification
Empower analyst
to find & preview
data on their own
Analyst describes
requirement to App
Developer
Define Requirements
Preview Data
Analyst creates
requirements definition
based on request
Analyst previews data
to confirm data
sources
Days/Weeks
Mins/Hrs
Estimate project
scope based on
impact analysis
# Data Sources
Each data source in a
different application
requires another app
developer to get
involved
Days/Weeks
Mins/Hrs
Build, Test, and
Deploy
Complexity
Find data
sources and
targets
As-Is Process
Which data sources
contain the
information
I need
Identify
joinfor
the
report? and
conditions
Days/Weeks
Mins/Hrs
Mins/Hrs
Find data
sources and
targets
Profile data
sources for data
quality
Which data sources
contain the
information
I need
Identify
joinfor
the
report? and
conditions
data quality rules
To-Be Process
Analyst defines
mapping
specification
View business term
and associated objects
Estimate project
scope based on
impact analysis
Build, Test, and
Deploy
Search for data
sources and targets
using business terms
Browse and
navigate data
lineage to find data
sources and targets
10
Govern The Information Value Chain
Build A Solid Data Governance Foundation
Deliver A Single Version
Of The Truth Of
Information Through
Master Data Management
INFORMATION
CONFIDENCE
MDM
Deliver Context
Rich Information
Through Metadata
Management
INFORMATION
TRANSPARENCY
Meta
Data
Mgt.
Data
Quality
Retention / Privacy
Deliver Consistent,
Correct, &
Complete
Information
Through Pervasive
Data Quality
INFORMATION
INTEGRITY
11
Find data
sources and
targets
Profile data
sources for data
quality
Identify join
conditions and
data quality rules
Can I trust this data?
Are there any data
quality
issues
Analyst
defines
mapping
specification
As-Is Process
Estimate project
scope based on
impact analysis
Create Data Dump
App Developer creates data
dump to spreadsheet or
sandbox environment
Build, Test, and
Deploy
Browser-based tool
to profile data
without the help of
a developer
Identify Data Quality Rules
Analyst works with business to
determine data quality rules to
fix issues per requirements
Profile Data
Analyst profiles in spreadsheet
or creates & runs SQL scripts to
profile data
Days/Weeks
Mins/Hrs
Need to either find
another data source or
go back to business to
identify alternatives
Days/Weeks
Mins/Hrs
Find data
sources and
targets
Days/Weeks
Mins/Hrs
Days/Weeks
Mins/Hrs
Mins/Hrs
12
How do you profile data today?
Dump data to spreadsheet and profile in spreadsheet
Dump data to sandbox environment and use SQL
scripts
Use Informatica Profiling
Other
13
Find data
sources and
targets
Profile data
sources for data
quality
Identify join
conditions and
data quality rules
Can I trust this data?
Are there any data
quality
issues
Analyst
defines
Why is the Customer
number not unique?
Identify null violation
issues (e.g. zip code)
mapping
specification
To-Be Process
Estimate project
scope based on
impact analysis
Build, Test, and
Deploy
Identify duplicate
records and
uniqueness violations
Drill down to see details
and change in state
Check column
formats (e.g. date)
Column Profiling
DEMO
14
How do we join these
tables together and fix
these data quality Profile data
Find data
issues?
sources and
sources for data
targets
quality
As-Is Process
Identify join
conditions and
data quality rules
Analyst defines
mapping
specification
Identify Join Conditions
Identify Join Transformations
Developer identifies &
validates PK-FK relationships
using ad-hoc approach
Developer determines which
lookups, filters and transformations to
apply in order to meet join conditions
Perform Join Analysis
Developer searches
documentation and creates
SQL scripts to profile tables
Use cross-table
profiling and join
analysis in DI Build E-R Model
developer tool
Developer builds E-R
model based on table
relationships
Estimate project
scope based on
impact analysis
Search common
metadata repository
for data quality rules
already created
Fix DQ Issues
Developer writes script to resolve
data quality issue (which has
probably been written before)
Request DQ Rules
Analyst verifies data
quality rules fix issues
Request DQ Rules
Perform Join Analysis
Analyst requests data quality
rules from developer to fix data
quality issues per requirements
Analyst verifies the E-R
model meets business
requirements
Days/Weeks
Mins/Hrs
Days/Weeks
Mins/Hrs
Complex DQ
issues that are
not easy to
resolve may
need to go back
to the business
Profile data
sources for data
quality
Days/Weeks
Mins/Hrs
Build, Test, and
Deploy
Days/Weeks
Mins/Hrs
Mins/Hrs
15
How do we join these
tables together and fix
these data quality Profile data
Find data
issues?
sources and
sources for data
targets
quality
To-Be Process
Identify join
conditions and
data quality rules
Analyst defines
mapping
specification
Estimate project
scope based on
impact analysis
Build, Test, and
Deploy
Infer primary and
foreign keys
View PK-FK
relationships
Confirm join condition will
work, identify orphan rows,
and find redundant values in
other tables
Join Analysis
DEMO
Data Quality Rule
DEMO
16
Find data
sources and
targets
As-Is Process
Which source attributes
map to target attributes?
Whatdata
transformationsIdentify
and
Profile
join
sources
for
data
conditions
and
rules are required to map
quality
data
quality
rules
the sources to targets?
Analyst defines
mapping
specification
Identify Field Mappings
Create Mapping Document
Analyst describes transformations
between source fields and target
fields based on requirements
Analyst uses spreadsheet
to create mapping
document
Estimate project
scope based on
impact analysis
Use intuitive purpose
built tool to define
specifications and
generate mapping logic
Developer requests
clarification from
Analyst
Add Sources & Targets
Identify Field Mappings
Analyst adds sources and
targets to mapping
document
Days/Weeks
Mins/Hrs
Developer(s) recommend
source-to-target field
mappings
Days/Weeks
Mins/Hrs
Verify Field Mappings
Analyst verifies source-totarget field mappings meet
requirements
Leverage common
metadata repository and
business glossary to
understand objects and
relationships
Days/Weeks
Mins/Hrs
Build, Test, and
Deploy
Days/Weeks
Mins/Hrs
Mins/Hrs
17
How many iterations between the Business and
IT does it take to get specifications right?
Less than 10
10 to 24
25 to 49
50 or more
18
Find data
sources and
targets
To-Be Process
Which source attributes
map to target attributes?
Profile
Identify join
Whatdata
transformations and
sources
for
data
conditions
and
rules are required to map
quality
data
quality
rules
the sources to targets?
Analyst defines
mapping
specification
Estimate project
scope based on
impact analysis
Build, Test, and
Deploy
Easily map data
sources to both
physical and virtual
data targets
Preview and validate
specification results
Include pre-built ETL
and data quality rules
as transformations in
mapping specification
Specify transformation
logic using reusable
expressions
Include transformation
descriptions to instruct
developer
Define Specification
DEMO
Mapping Validation & Generation
DEMO
19
Find data
sources and
targets
Profile data
sources for data
quality
What is the scope of
this change? What
other reports and
Identify join
artifacts
are
conditions and
impacted?
data quality rules
As-Is Process
Analyst defines
mapping
specification
Estimate project
scope based on
impact analysis
Build, Test, and
Deploy
Search SCCS
Estimate QA Time
Developer searches sourcecode control for reports or
other objects impacted
Code Review
List Impacted Artifacts
Estimate Dev Time
Developer reviews code to
determine what reports
and objects are impacted
Developer compiles list of reports
and artifacts that need to be
retested as a result of the change
Developer estimates level
of effort to build and unit
test objects
Days/Weeks
Mins/Hrs
QA provides estimate on
how long to retest reports
and other affected objects
Leverage metadata
management with data
lineage to perform
impact analysis
Days/Weeks
Mins/Hrs
Days/Weeks
Mins/Hrs
Days/Weeks
Mins/Hrs
Mins/Hrs
20
Find data
sources and
targets
Profile data
sources for data
quality
What is the scope of
this change? What
other reports and
Identify join
artifacts
are
conditions and
impacted?
data quality rules
To-Be Process
Analyst defines
mapping
specification
Estimate project
scope based on
impact analysis
Build, Test, and
Deploy
View upstream
lineage
View downstream
impact
Lineage object
View
downstream
lineage
View upstream
impact
Lineage & Impact Analysis
DEMO
21
Find data
sources and
targets
Profile data
sources for data
quality
What is the fastest
way to deploy this
change
to productionAnalyst defines
Identify
join
conditions and
data quality rules
mapping
specification
As-Is Process
Estimate project
scope based on
impact analysis
Build, Test, and
Deploy
Specify Results
Clarify Specification
Analyst clarifies sourcetarget mapping
specification
Automatically
generate
mapping
logic
Developer
needs
from
specification
clarification
Translate Specification
Developer translates
mapping specification into
mapping logic
Analyst verifies target
results based on
common repository
Verify Requirements
System Test
Deploy early and
Analyst verifies that target
test often to speed QA tests all affected
data meets business
up iterations
objects and reports
requirements
Use
Request analyst to
comparative preview target data in
profiling dev/test environment
Acceptance Test
Unit Test
Reuse DI & DQ
rules in unified
Change Objects developer
environment
Developer makes necessary
changes to affected objects
and reports
Days/Weeks
Mins/Hrs
Analyst specifies
expected results
Developer performs
unit testing
Automatically
compare actual
results with
expected
results
Test results are inconsistent
with
business requirements. Go back
to specification step
Days/Weeks
Mins/Hrs
Days/Weeks
Mins/Hrs
Business user performs
user acceptance testing
Analyst defines
mapping
specification
Days/Weeks
Mins/Hrs
Mins/Hrs
22
How long does it take to deploy a change to the
data warehouse affecting > 1 BI report?
Less than 2 weeks
2 to 4 weeks
5 to 8 weeks
More than 8 weeks
23
Data Architecture for Agile DI
Data Virtualization & Adaptive Data Services
Data Integration
Master Data
Management
Data Quality
ODBC/JDBC
Access
Quality
Test Data
Management &
Archiving
Web
Services
Retention
B2B
PowerCenter
Privacy
Freshness
SOA/
Composite Apps
Business
Intelligence
Support All Projects
Flexible Deployment
Implement All Policies
Business-Friendly
Data Abstraction
Access All Data
Mainframe
Databases
Unstructured and
Semi-structured
Applications
Cloud
Social Media
NoSQL
24
Find data
sources and
targets
Profile data
sources for data
quality
Extend the generated
mapping for rapid
development
Identify
join
Analyst defines
conditions and
data quality rules
mapping
specification
To-Be Process
Estimate project
scope based on
impact analysis
Build, Test, and
Deploy
Mapping logic
automatically generated
from analyst defined
specification
Extend mapping to
dynamically mask
sensitive data
Consume WS
DEMO
Extend mapping to
consume web services
Dynamic Masking
DEMO
25
Find data
sources and
targets
Profile data
sources for data
quality
What is the fastest
way to deploy this
change
to productionAnalyst defines
Identify
join
conditions and
data quality rules
mapping
specification
To-Be Process
Estimate project
scope based on
impact analysis
Build, Test, and
Deploy
Deploy to PowerCenter
Deploy as a SQL view
Deploy as a web service
Deploy SQL View
DEMO
Deploy Web Service
DEMO
Deploy PowerCenter
DEMO
26
Find data
sources and
targets
Profile data
sources for data
quality
What is the fastest
way to deploy this
change
to productionAnalyst defines
Identify
join
conditions and
data quality rules
Compare profiling
statistics during
development to ensure
data quality
mapping
specification
To-Be Process
Estimate project
scope based on
impact analysis
Build, Test, and
Deploy
Validate actual target
data against expected
data
Actual
Results
Expected
Results
B
Comparative
Profiling
Data Validation
B
Comparative Profiling
DEMO
Data Validation
DEMO
27
Agile Data Integration
Find data
sources and
targets
Profile data
sources for data
quality
Identify join
conditions and
data quality rules
Analyst defines
mapping
specification
Estimate project
scope based on
impact analysis
Build, Test, and
Deploy
Cut project throughput times by 90%
Double productivity
Project
Request
Define
Requirements
Analyze &
Design
Build
Test
Deploy
Support &
Maintain
28
Agile Data Integration
Find data
sources and
targets
Profile data
sources for data
quality
Identify join
conditions and
data quality rules
1.
2.
3.
4.
5.
6.
Project
Request
Define
Requirements
7.
Analyst defines
mapping
specification
Estimate project
scope based on
impact analysis
Build, Test, and
Deploy
Analyst quickly finds data sources by searching business glossary
and data lineage
Analyst profiles data sources to identify data quality issues sharing
results with developer through common metadata repository
Developer rapidly infers PK-FK relationships, validates join
conditions, and finds pre-built data quality rules
Analyst defines and validates mapping specification that leverage
pre-built ETL and data quality rules
Developer extends automatically generated mapping logic to
include other transformations and web services
Developer rapidly deploys SQL view, web service, and batch ETL
process
Developer and QA test using comparative profiling and data
validation automation
Analyze &
Design
Build
Test
Deploy
Support &
Maintain
29
How much faster could you deliver projects using the
Agile Data Integration methods described today?
2X
3X
5X
10X
30
Agile Data Integration
Other Project Types
MDM
Application Integration
Application Modernization
Data Migration
31
Agile Data Integration
How to Get Started
Identify a Sponsor and Change Agent
Identify Early Win Project
Know Your Customer
Create As-Is Value Stream
Look at one of your last projects
Create To-Be Value Stream
Leverage Informatica Sales and Professional Services
32
Q&A
33
Informatica Proprietary/Confidential. Informational Purposes Only. No Representation, Warranty or
Commitment regarding Future Functionality. Not to be Relied Upon in Making Purchasing Decision.