UBS OCF - IDQ Capabilities review
IDQ – 1WMP Phoenix Data
Migration use cases
2
1WMP – Data migration implementation
Use Cases
Data Profiling / Data Business
Data Integration Reusability UBS NFR
Quality Transformation Verification
• Disparate source • Out of the box • Expression • Direct business • Reusable • Single sign on
data extracts profiling – null • Aggregator user access mappings through UBS
• Supporting multi- check, min / max • Rank • Step by step data through logical smart card
byte characters check, cardinality • Joiner verification and customized • Controlled user
• Dynamic etc • Filter • Easy data objects access via BBS
mapping • Business Rule • Java interpretation of • Reference data
based profiling • Router business logics / management
• Data drill down • Sorter lineage
• Score card • Read / Write
generation and • Lookup
monitoring • Union
• Merge
3
OCF – Use cases & Tool
requirements
4
OCF – Use cases
IDQ
S.No Use Case High level description Details
Capability
1 Random Sampling engine logic, where from a transformed result set of around 100,000 to Out of the box, through
Sampling 500,000 records a random sample of 10 to 20 records need to selected for reporting profiling
2 Risk based Scenario based sampling, where from a large data set (500K - 1MM records), is ranked Out of the box, through
sampling based on requirement criteria (e.g. Trades of Client Advisors with largest risk exposure)
and from there random samples of 10 to 20 records need to be chosen
profiling
3 CA Sweep Based on historical samples of Use Case 1 and 2, the trades associated with Client Through transformations
Advisors already selected in the samples 1 & 2 are excluded from the universe (full
data set), and from the excluded set 1 or 2 sample records are randomly selected for
reporting
4 Fuzzy logic – Address comparison to find same addresses, using fuzzy logic comparison techniques Out of the box. Through
String MATCH transformation.
comparison
(Jaro Winkler
Support Hamming,
Bigram, Edit & Jaro
distance) distance
5 Large volumes Process large volumes of datasets (~10MM) across multiple scenarios, combine them
of data ~10 based on logic (joins, filters, sorts) to generate a result of over ~12 to 15 MM records in
million – time reasonable timeframe
series data
6 Email Emailing interface to Outlook to send scheduled emails to configured recipients with No out of the box
scheduling of embedded or attached results (excel / csv file or table format) from ETL (e.g. Email to solution. However can
output Desk Heads on the CA's under them whose clients had high value trades for the day) be done through Java
transformation
5
OCF – Tool requirements
IDQ
S.No Tool req. High level description Details
Capability
1 Enterprise and Need for an Enterprise & Server based Tool featuring the capability of the E2E Server based,
Server based Reporting requirements – Sampling, Analytics, ETL & Reporting generation Intermediate analytics &
reporting support
2 Visualize Ability to visualize process flow and business rules to be able to easily determine the Easy to understand
process flow outcome of each business rule through existing
interface
3 Reusability & Ability to repeat the same business rules with different data sets and over different time Demonstrated in
rerun ability periods with configurable sampling parameters (that would allow the same business phoenix migration
rules to be applied across regions but different sampling volume for each location
requirement)
4 Traceability Traceability of data sources, aggregation of data and application of business rules to be Easy to understand
documented visually through existing
interface
5 Automation Ability to automate execution of business rules to ensure samples/cases available for Informatica scheduler
investigation by case managers before business hours in each region can be used
6 Agility Ability to test and evaluate changes quickly and easy, document inline and deploy Can be done through
same asset adequate privileges
6
IDQ Features & Benefits
Informatica Data Quality Features
Reference Data Data Quality
Data Profiling De-Duplication Exception Handling Monitoring
Management Enhancement
• Access Data for • Easy • Data Cleansing & • Configure • Data Stewardship • Configuration of
anomalies and Maintenance of Enrichment Probabilistic and • Manual Data Dashboards and
inconsistencies Enterprise wide through Deterministic correction Reports for
• Build Metrics and Reference data Mapplets and Match Rules • Manual continuous DQ
Scorecards • Audit trail for Rules • Creation of Consolidation of monitoring
• Build Rules for capturing • Address Clusters Duplicate Data • Reactive and
Profiling changes to the Standardization • Consolidation of • Audit Mechanism Proactive
• Trend analysis on LOV list • Publish DQ rules Match candidates for Exception Monitoring
DQ metrics as Web Services handling capability
Benefits
• Proactively cleanse and monitor data for all applications and keep it clean
• Huge savings on ongoing data quality maintenance by Business users
• Enable the business to share in the responsibility for data Quality and Data Governance
• Enhance IT productivity with powerful business-IT collaboration and a common data quality environment
7
Data Profiling Features & Benefits
Provide immediate insight into the • End to end data profiling to discover the
basic quality of data and quickly content, quality, and structure of a data source:
expose potential areas of risk • Column Profiling
• Primary Key profiling
• Functional dependency Profiling
• Data Domain Discovery
Stats to identify outliers
• Enterprise Data Discovery
and anomalies in data Value and Pattern
Frequency to isolated
• Customized business rules can be created and
inconsistent/dirty data or used during profiling
unexpected patterns
• Rule builder allows Business Users to efficiently
collaborate with Developers for building
complex business rules.
• Rich GUI interface enables easy readability of
Rule Specifications.
• Scorecards can be created to display the value
frequency for columns in a profile
• Trend charts can be configured to view the
history of scores over time
Drill down into actual data
values to inspect results across
entire data set, including
potential duplicates
8
Scorecards and Trend Charts
Features & Benefits
• Enables Business to “measure the data-fitness” based on defined metrics before using it for various
data-driven projects.
• Critical for making good decisions about data quality improvement initiatives.
• Trend Charts allows Business to evaluate the progression and ROI of Data quality programs.
Quantify the Quality of Data with
• Weighted Scores on multiple metrics can help to find root causes and significant contributors for
Scorecards and Trend Charts
poor Data Quality Scores
9
Reference Data Management
Features & Benefits
Enriching or Standardizing Data using • Enables Business to create and manage
Reference Data Reference data
• Maintain Audit trails to monitor changes
to the Reference data objects
• Use Reference data objects to
standardize and enrich source data
during data quality operations
• Same Reference Data Objects can be
used across multiple data quality
projects
• Reference data objects can be created
from Column Profile values, Patterns, flat
files and database tables
10
Rule Builder Features & Benefits
• Enables Business Users to define Data Requirements of a Business Rule as a reusable
software object that can be run against the data to check its validity.
• Allows Business Users to efficiently collaborate with Developers for building complex
business rules.
• Rule Specifications defines condition-action pairs for defining Business Rules that can
Define and Design Business Rules
be evaluated in a particular order for validity.
• Rich GUI interface enables easy readability of Rule Specifications.
11
Data Quality Enhancement
Cleanse and Standardize Data, Resolving
and address data quality issues
Features & Benefits
• Build rules and mapplets to address data
quality issues
• Address validation corrects errors in
addresses and completes partial
addresses
• Reference data usage for enhancing DQ
process
• Exception handling for Manual review
and correction
• Export Maps to PowerCenter for
metadata reuse for physical data
integration
• Web Service consumers /provider for
integration with any SOAP based
application
12
Data Quality Mapping
Address Validation and
Geocoding enrichment across
260 countries
Standardization and Reference
Data Management
Address
Validation
Standardize
Parsing of Unstructured
Data/Text Fields of all data Parsing
types of data (customer/
product/ social/ logs)
DQ logic pushed down/run Native or Hadoop
13
De-Duplication
Features & Benefits
Identify Duplicates and Consolidate
• Customizable match rule
• Support both fuzzy as well as exact
match rule.
• Duplicate analysis and consolidation
of source data.
• Identity Matching capability using
population files
• Auto merging of the data based on
customizable de-duplication rule.
• Manual merging /unmerging for the
data which have low match score.
14