0% found this document useful (0 votes)
35 views7 pages

Chapter 5 Summary

Uploaded by

huseinremix
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views7 pages

Chapter 5 Summary

Uploaded by

huseinremix
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Here is a summary of the chapter on Data Modeling and Design.

1. Introduction
● Data modeling is the process of discovering, analyzing, and scoping data requirements,
and then representing and communicating these data requirements in a precise form
called the data model1. This process designs how data fits together and allows an
organization to understand its data assets2.
● Data models contain essential Metadata for data consumers and are critical to other
data management functions, providing definitions for data governance and lineage for
data warehousing3.
● There are six commonly used schemes for representing data: Relational, Dimensional,
Object-Oriented, Fact-Based, Time-Based, and NoSQL4. These models exist at three
levels of detail: conceptual, logical, and physical5.

1.1 Definition, Goal, and Business Drivers


● Definition: Data modeling is the process of discovering, analyzing, and scoping data
requirements, and then representing and communicating these data requirements in a
precise form called the data model6. This process is iterative and may include a
conceptual, logical, and physical model7.
● Goal: To confirm and document an understanding of different perspectives, which leads
to applications that more closely align with current and future business requirements,
and creates a foundation to successfully complete broad-scoped initiatives such as
master data management and data governance programs8.
● Business Drivers: Data models are critical because they:
○ Provide a common vocabulary around data9.
○ Capture and document explicit knowledge about an organization's data and
systems10.
○ Serve as a primary communications tool during projects11.
○ Provide the starting point for customization, integration, or replacement of an
application12.

2. Essential Concepts
2.1 Data Model Components
Most data models are built from the same basic components:
● Entity: A thing about which an organization collects information, sometimes referred to
as the "nouns" of an organization13. Entities are depicted as rectangles with their names
inside14. High-quality entity definitions are core Metadata and essential for the business
value of a data model15.
● Relationship: An association between entities that captures interactions and
constraints16. Relationships are shown as lines on a data model diagram17.
○ Cardinality: Captures how many instances of one entity participate in a relationship
with another entity18. The choices are zero, one, or many19.
○ Arity: The number of entities in a relationship, most commonly unary (recursive),
binary, or ternary20.
● Attribute: A property that identifies, describes, or measures an entity21. In a physical
model, this corresponds to a column, field, or tag22.
● Identifier (Key): A set of one or more attributes that uniquely defines an instance of an
entity23.
○ Construction-type keys include simple, compound, composite, and surrogate
keys24. A surrogate key is a system-generated unique identifier with no business
meaning25.
○ Function-type keys include candidate, primary, and alternate keys26. The primary
key is the candidate key chosen as the unique identifier for an entity27.
● Domain: The complete set of possible values that an attribute can be assigned28.
Domains standardize attribute characteristics and can be defined by data type, format,
lists, ranges, or rules29292929.

2.2 Data Modeling Schemes


● Relational: Based on Dr. Edward Codd's theory, it organizes data into two-dimensional
relations to express business data precisely and have "one fact in one place"30303030. It is
ideal for operational systems31.
● Dimensional: Structures data to optimize the query and analysis of large amounts of
data32. It consists of fact tables (containing numeric measurements) and dimension
tables (containing textual descriptions)33333333.
● Object-Oriented (UML): A graphical language for modeling software34. The UML class
model specifies classes (entity types) and their relationships35. It includes "Operations"
or "Methods," which are not present in traditional Entity-Relationship (ER) diagrams36.
● Fact-Based Modeling (FBM): A family of conceptual modeling languages based on
analyzing natural verbalizations of business facts37. The most widely used variant is
Object Role Modeling (ORM)38.
● Time-Based: Patterns used when data values must be associated chronologically39.
○ Data Vault: A hybrid approach using normalized tables (hubs, links, and satellites)
designed for enterprise data warehouses40.
○ Anchor Modeling: A technique suited for information that changes over time, using
anchors, attributes, ties, and knots41414141.
● NoSQL: A category for databases built on non-relational technology42. The four main
types are document, key-value, column-oriented, and graph databases43.

2.3 Data Model Levels of Detail


Data models exist at three levels, translating the ANSI/SPARC three-schema approach:
● Conceptual Data Model (CDM): Captures high-level data requirements as a collection
of related concepts44444444. It contains only basic and critical business entities and their
relationships45.
● Logical Data Model (LDM): A detailed representation of data requirements for a
specific usage context, but still independent of technology46. It extends the CDM by
adding attributes through normalization47.
● Physical Data Model (PDM): Represents a detailed technical solution adapted for a
specific set of hardware, software, and network tools48. It often involves
denormalization (intentionally adding redundancy) to improve query
performance49494949.

2.4 Normalization and Abstraction


● Normalization: The process of applying rules to organize business complexity into
stable data structures50. The goal is to keep each attribute in only one place to eliminate
redundancy51. The term "normalized model" usually means the data is in Third Normal
Form (3NF)52.
● Abstraction: The removal of details to broaden applicability while preserving important
properties53. This includes generalization (grouping common attributes into supertypes)
and specialization (separating distinguishing attributes into subtypes)54.

3. Activities and Processes


● Forward Engineering: The process of building a new application starting with
requirements, moving from a Conceptual (CDM) to Logical (LDM) to Physical (PDM)
model55.
● Reverse Engineering: The process of documenting an existing database, moving from a
Physical (PDM) to a Logical (LDM) to a Conceptual (CDM) model56.
● Data Model Review: Data models require quality control and continuous improvement57.
Design reviews should be conducted with a group of subject matter experts representing
different backgrounds58.
● Data Model Maintenance: Once built, data models must be kept current59. Updates
should be made when requirements or business processes change60.

4. Tools
● Data Modeling Tools: Software that automates many tasks, from basic drawing to
forward and reverse engineering, naming standards validation, and Metadata
storage616161616161616161.
● Lineage Tools: Software that captures and maintains the source structures for each
attribute on the data model, enabling impact analysis62.
● Data Profiling Tools: Help explore data content, validate it against Metadata, and
identify Data Quality gaps63.
● Data Model Patterns: Reusable modeling structures (elementary, assembly, and
integration) that can be applied to a wide class of situations64646464.
● Industry Data Models: Pre-built data models for an entire industry (e.g., healthcare,
retail) that can be purchased and customized65656565.

5. Governance and Best Practices


● Naming Conventions: Standards are particularly important for entities, tables,
attributes, and keys66. Logical names should be meaningful to business users, while
physical names must conform to DBMS length limits67.
● Database Design (PRISM): The DBA should keep five design principles in mind:
○ Performance and ease of use68.
○ Reusability69.
○ Integrity70.
○ Security71.
○ Maintainability72.
● Data Model Quality: Data professionals must balance short-term project needs with
long-term enterprise interests73737373. Data modeling standards should be developed, and
design quality should be reviewed regularly74747474.
● Versioning and Integration: Data models require careful change control, just like other
SDLC deliverables75.
● Metrics: A data model's quality can be measured using a standard for comparison76. The
Data Model Scorecard® is one method that provides metrics across ten categories,
including how well the model captures requirements, its structural soundness, and the
quality of its definitions77.

Acronyms and Definitions

Acronym Meaning / Definition from Chapter


Context
1NF First Normal Form

2NF Second Normal Form

3NF Third Normal Form

4NF Fourth Normal Form

5NF Fifth Normal Form

BCNF Boyce/Codd Normal Form

CDM Conceptual Data Model

DDL Data Definition Language

EAI Enterprise Application Integration

EDM Enterprise Data Model

ER Entity-Relationship

ERP Enterprise Resource Planning

ESB Enterprise Service Bus

FCO-IM Fully Communication Oriented Modeling

GPS Global Positioning System

IE Information Engineering
IDEF1X Integration Definition for Information
Modeling

LDM Logical Data Model

MDBMS Multidimensional Database Management


System

NoSQL "Not Only SQL" (non-relational


technology)

ORM Object Role Modeling

PDM Physical Data Model

PRISM Performance, Reusability, Integrity,


Security, Maintainability

RDBMS Relational Database Management System

RFID Radio Frequency Identification

SCD Slowly Changing Dimension

SDLC System Development Lifecycle

SME Subject Matter Expert

SPARC Standards Planning and Requirements


Committee

SQL Structured Query Language


UML Unified Modeling Language

UPC Universal Product Code

VIN Vehicle Identification Number

Wi-Fi Wireless Fidelity

You might also like