Unit 3 DA
Data Rationalization / Variable Rationalization:
Data rationalization through managed metadata and data modeling,
can support semantic resolution, enabling improved analysis and
knowledge transfer.
Introduction
Ontology is “the study of the categories of things that exist or may
exist in some domain”. Another definition of ontology, applicable to
domains, “a collection of taxonomies and thesauri” from Seth Earley.
Across the Internet and within internal systems, ontologies are used
to improve search capabilities and make inferences for improved
human or computer reasoning. By relating terms in ontology, the
user doesn’t need to know the exact term actually stored in the
document.
Data rationalization is a Managed Meta Data Environment
(MME) enabled application which creates/extends an ontology for a
domain into the structured data world, based on model objects stored
in various models (of varying levels of detail, across model files and
modeling tools) and other meta data Data models, often unknowingly,
express many aspects of ontology..
These concepts form parts of several components of enterprise data
management.
The primary reason for data modeling is to create physical data
structures – though a critical best practice for data modeling is to
follow a phased modeling approach – typically developing conceptual,
logical, and finally physical data models.
Conceptual data models are sometimes considered to be semantic
models since they are expressed in business terminology and
demonstrate how key business objects relate to each other,
independent of technology or application.
There are other types of data models (e.g. enterprise models) and
other metadata that should be linked together to provide a more
holistic view of a domain.
Unfortunately, most modeling tools are incapable of handling all of the
different levels of models effectively; it is common for more than one
modeling tool to be used in an enterprise, and multiple model files are
usually a necessity. For example, a data modeling tool might be used
for logical and physical models, while a UML class diagram might be
used for the conceptual model.
Connecting these model objects, and visualizing these objects and
relationships is called Data Rationalization, and it is typically enabled
as part of a Managed Metadata Environment (MME) by leveraging a
Metadata Repository (MDR) tool.
Data Rationalization is a “vertical data lineage” as opposed to
horizontal data lineage employed for data movement (i.e.
Information Supply Chain).
In Data Rationalization, we’re not trying to find where an actual piece
of data came from (i.e. source to target), but what higher order model
objects the data was conceived from or objects that can help to
explain it, or which downstream objects implement the higher order
model objects (see Figure 1 below).
Figure 1: Example – Data Rationalization versus Information Supply Chain
ECDM: Enterprise conceptual data model
ELDM: Enterprise logical data model
CDM: Conceptual data model
LDM: Logical data model
PDM: Physical data model
DW: Data Warehouse
OLTP: Online Transaction Processing
Benefits of Data Rationalization
What is the benefit of Data Rationalization? To be able to
effectively exploit, manage, reuse, and govern enterprise data
assets (including the models which describe them), it is necessary
to be able to find them.
In addition, there is (or should be) a wealth of semantics (e.g.
business names, definitions, relationships) embedded within an
organization’s models that can be exposed for improved analysis and
knowledge transfer. By linking model objects (across or within models)
it is possible to discover the higher order conceptual objects for any
given object. Conversely, it is possible to identify what implementation
artifacts implement a higher order model object.
For example, using data rationalization, one can traverse from a
conceptual model entity to a logical model entity to a physical model
table to a database table, etc. Similarly, Data Rationalization enables
understanding of a database table by traversing up through the
different model levels.
Data Rationalization is an enabler of effective Data Governance. It is
not possible to govern information assets if without knowing the
location of the data and / or the variety of meaning given to each
object. Similarly, Data Rationalization can aid in the development
of Master Data Management solutions. By identifying common data
entities, and how these relate to other pieces of data (again, across
many systems), MDM solutions will be able to improve meeting
business user needs for all the systems that require the
master/reference data.
How does Data Rationalization work?
To be able to rationalize data, meta-relationships between model
objects (across model levels) must be established. Of course, doing
so does not replace normal types of relationships between model
objects in the same model. Meta-relationships can be established in
multiple ways:
1. Use automated modeling tool functionality
2. Use manual modeling tool functionality
3. Use modeling tool metadata fields
4. Use Metadata Repository (MDR) tool to manually establish links
using a GUI or other interface.
5. Use a spreadsheet
Once the meta-relationships are established, they must be imported
into the Metadata Repository (if not established using the MDR tool).
From there, analysts can search, retrieve, and visualize the metadata
to perform Data Rationalization analysis. Analysts do not need a
modeling tool license to explore the models (assuming the higher
order models can be found), or need to rely on the data modeler to
obtain access or export the model metadata.
Conclusion
A very simple example of a Data Rationalization analysis might be an
analyst wishing to understand the relationship between two tables.
Assume the analyst does not have a modeling tool license, does not
know what model to look for, or does not have access to the network
directory where the models are stored. Also, assume that foreign keys
have been disabled in the physical model (valid in some cases, e.g.
data warehousing…). Using the MDR, the analyst could search on the
table names, and when these are displayed rationalize upwards to see
the logical entities (in a separate model file) these tables originated
from. This enables the analyst to see the relationship and allows the
analyst to understand its cardinality, optionality, and identification and
review the relationship verb phrase.
Data Rationalization provides semantic understanding of the object in
question along with its higher and lower order companions, for
increased knowledge of the organization’s data and information.