Relational Data Modeling Table of Contents
3. Relational Data Modeling
● 3.1. Fundamentals of Relational Data Modeling
○ 3.1.1. Relation
■ Structured description of similar information objects (instances)
■ Data pool for data records with the same structure (tuples)
■ Nouns as identifiers
■ Representation in table form
■ Columns: Attributes (structure of data records)
■ Rows: data records
○ 3.1.2. Attribute
■ Property, characteristic (analogous to ER data model)
■ Atomic or composite (structured)
■ Single-valued or multi-valued
■ Normalized relation has only atomic and single-valued attributes
○ 3.1.3. Value Range
■ Set of possible values for attributes (analogous to ER data model)
■ Standard value ranges (based on SQL data types)
■ INTEGER and DECIMAL for "NUMBER" (DEC with specific digits
if required)
■ CHAR or VARCHAR for "CHARACTER" string (with specific digits)
■ DATE for "DATE" (YYYY.MM.DD)
■ TIME for "TIME" (HH:MM:SS)
■ BOOLEAN for "TRUTH" value (true | false)
■ Included in every value range: NULL (null value)
■ No distinction between "value unknown" and "value does not
exist"
■ Exclusion of null value for attribute by "NOT NULL" marking
○ 3.1.4. Primary Key
■ Every relation has a primary key (analogous to ER data model)
■ Minimal attribute combination for unique identification of data records
■ Possibly multiple key candidates available
■ Marking:
■ Specification for primary key attributes
■ Primary key values must:
■ not be changed
■ not have a null value
○ 3.1.5. Foreign Key
■ Relationships between relations are expressed by foreign keys
■ There are no relationship types in the relational model
■ Borrowing the primary key of another (foreign) relation
■ Reference: Foreign key refers to primary key of another relation
■ Referential integrity:
■ FK value corresponds to a PK value of the referenced relation or
is a null value
■ Marking:
■ "FK references <referenced relation> (<PK attributes>)"
■ In the relationship diagram:
○ 3.1.6. Integrity Condition
■ Guarantees proper state/integrity of a database (analogous to ER data
model)
■ Specifications for attributes (based on SQL specifications):
■ Value range, primary key and foreign key
■ "NOT NULL" or "NOT NULL (<attribute list>)"
■ "UNIQUE" or "UNIQUE (<attribute combination>)"
■ "DEFAULT = <value>"
■ "CHECK (<condition>)"
○ 3.1.7. Relational Database
■ Components:
■ Relational database schema
■ Structural descriptions for the (normalized) relations
■ Relational data records
● 3.2. Transformation of ER Data Model into Relational Model
○ 3.2.1. Transforming Entity Type
■ Example 3-4: Transforming the Employee entity type into a (normalized)
relation
ERDM
RM
■ ERDM Construct to RM Construct Mapping Table
■
○ 3.2.2. Transforming Relationship Type
■ General association
■ Example 3-5: Transforming ER diagram with relationship types of
varying complexity into a relational model
■ For 1-to-many or many-to-1 relationships, the foreign key is placed
in the table representing the "many" side.
■ For many-to-many relationships, a junction table is used,
containing the primary keys of the participating tables as foreign
keys, often forming a composite primary key.
■ For 1-to-1 relationships, a foreign key can be assigned to either
table, with a UNIQUE constraint to enforce the one-to-one
relationship.
■ Dependency
■ Example 3-6: Transforming the dependency relationship type
between the weak entity type Child and the entity type Employee
into a relational model
■ Aggregation
■ Example 3-7: Transforming the aggregate entity type Motor
Vehicle, composed of the component entity types Chassis, Body
and Optional Equipment, into a relational model
■ Generalization
■ Example 3-8: Transforming the super-entity type Person, entirely
composed of the sub-entity types Employee and Customer, with
the possibility of an employee also being a customer, into a
relational model
■
1. Generalization and Specialization
● Generalization: This process involves combining multiple lower-level entity types into a
higher-level entity. It abstracts common properties to a generalized parent entity, often
used to model inheritance.
● Specialization: This is the reverse process where a higher-level entity type is divided
into one or more lower-level entities, often used to model subtypes based on certain
attributes or criteria.
2. Totality (Total Specialization)
● Definition: Total specialization means that every instance of the supertype (generalized
entity) must be an instance of at least one of its subtypes.
● Example: Consider a Person supertype that is specialized into Student and Teacher
subtypes. If the specialization is total, every Person must be either a Student or a
Teacher.
● ERDM Implication: In the Entity-Relationship Data Model, this is often represented
using a double line connecting the supertype to the subtypes.
● RM Implication: In the Relational Model, total specialization implies that foreign key
relationships between the parent table (supertype) and its subtypes enforce mandatory
participation.
3. Partiality (Partial Specialization)
● Definition: Partial specialization means that instances of the supertype may or may not
belong to any of the subtypes. In other words, it’s not mandatory for an instance of the
supertype to appear in any of its subtypes.
● Example: Using the same Person example, a partial specialization would allow some
instances of Person to not be classified as either Student or Teacher.
● ERDM Implication: This is represented using a single line connecting the supertype to
the subtypes.
● RM Implication: In the Relational Model, this allows for supertype instances that are not
linked to any subtype records.
4. Disjointness (Disjoint Specialization)
● Definition: Disjoint specialization means that an instance of the supertype can be a
member of only one subtype. This ensures that subtypes do not overlap.
● Example: If Person is disjointly specialized into Student and Teacher, then an
instance can only be a Student or a Teacher, but not both.
● ERDM Implication: This is represented using a "D" symbol next to the arc or connector
line that joins the supertype to its subtypes.
● RM Implication: Enforced through constraints that prevent overlap between subtype
entities.
5. Non-Disjointness (Overlapping Specialization)
● Definition: Non-disjoint specialization means that an instance of the supertype can
simultaneously be a member of more than one subtype.
● Example: In the same Person scenario, if the specialization is non-disjoint, an
individual could be both a Student and a Teacher at the same time.
● ERDM Implication: This is represented using an "O" symbol or an explicit notation
indicating that the subtypes can overlap.
● RM Implication: This allows for instances of the supertype to have multiple
corresponding entries in different subtype tables.
Putting It All Together:
When modeling generalization/specialization relationships, combining totality/partiality and
disjoint/non-disjoint properties helps specify precise data constraints and rules:
● Total & Disjoint: Every supertype instance must belong to exactly one subtype.
● Total & Non-Disjoint: Every supertype instance must belong to at least one subtype
and can belong to multiple subtypes.
● Partial & Disjoint: Supertype instances may belong to exactly one subtype, but this is
not mandatory.
● Partial & Non-Disjoint: Supertype instances may belong to one or more subtypes or
none at all.
This combination of constraints ensures consistency, reduces ambiguity, and defines inheritance
rules more clearly in data modeling.
■ Relationship type with degree > 2
■ Example 3-9: Transforming the relationship type "buys...from" of
degree 3, used to manage the purchase of goods, into a relational
model
■ Other special features
■ Example 3-10: Transforming an ER diagram with role names and
a relationship type with a fixed upper multiplicity limit into a
relational model
■ ERDM Construct to RM Construct Mapping Table
○ Exercise 4: Transforming the ER data model from Exercise 2 into a relational
model
● 3.3. Normalization in the Relational Model
○ 3.3.1. Basic Idea of Normalization
■ Example 3-11: Managing student exam results in a single relation "Exam"
and demonstrating anomalies
■ Insertion Anomaly: This occurs when it is impossible to insert new data
into a relation without first having other, seemingly unrelated, data. For
instance, in the "Exam" relation, you cannot add data about a student
unless they are associated with an exam. This constraint prevents storing
information about prospective students or creating a record for a new
student who has yet to take any exams.
Modification (Update) Anomaly: This anomaly arises when updating a
single data element necessitates modifying multiple records. For example, in the
"Exam" relation, changing a student's name requires updating all exam records
for that student. Failure to modify even one record introduces inconsistencies in
the database.
Deletion Anomaly: This occurs when deleting a record inadvertently
removes other information not intended for deletion. In the "Exam" relation,
deleting the last exam record a student participated in also deletes all data about
that student.
○ 3.3.2. Attribute Dependencies
■ Functional dependency
■ Example 3-12: Functional dependencies in the "Personnel"
relation
■ Types of undesirable dependencies
■ Partial functional dependency
■ This occurs when a non-key attribute is functionally
dependent on only part of a composite key. For
instance, if a relation has a composite key (A, B)
and a non-key attribute C, where B → C, then C is
partially dependent on the key (A, B)
■ Transitive dependency
■ This arises when a non-key attribute is functionally
dependent on another non-key attribute. If a key
attribute A determines non-key attribute D (A → D),
and D determines another non-key attribute F (D →
F), then a transitive dependency exists (A → D →
F)
■ Multi-valued dependency
■ Example 3-13: Transforming the "Exam" relation from Example
3-11 to use only atomic attributes and demonstrating problems
■ While functional dependencies apply to single-valued attributes,
multi-valued dependencies arise when one attribute (or a set of
attributes) can determine multiple values for another attribute.
Relational database systems are not designed to handle
multi-valued attributes directly.
■ Notation: Multi-valued dependencies are represented as A →→ B
or A1, ..., An →→ B.
○ 3.3.3. First Normal Form (1NF)
■ Example 3-14: Normalizing the "Exam" relation from Example 3-11
Steps to Achieve 1NF
To bring the "Exam" relation into compliance with 1NF, the sources
outline the following steps:
1. Isolate Multivalued Attributes: Create separate relations for each
multivalued attribute. In this case, we create new relations for
"Lecture" and "Student."
2. Decompose Structured Attributes (if any): If the multivalued attributes
have a structure (are composite), break them down into individual atomic
attributes.
3. Establish Relationships: Add the primary key of the original relation as
a foreign key to the new relations. This establishes the relationship between the
original and the newly created relations.
4. Define Primary Keys: Choose a suitable primary key for each new
relation.
Violations of first Normal form:
1. Using row order to convey information.
2. Mixing data types within columns(prohibited by database systems
already)
3. No primary key
4. Repeating groups are not permitted.
○ 3.3.4. Second Normal Form (2NF)
■ Example 3-14a: Demonstrating the persistence of anomalies in the 1NF
relations from Example 3-14
■ Example 3-15: Further normalizing the relations from Figure 3-16 into
2NF
■ A relation is considered to be in 2NF if it meets the following conditions:
■ 1. It is in 1NF: The relation already satisfies the criteria for 1NF.
■ 2. Full Functional Dependency: Every non-key attribute is fully
functionally dependent on the entire primary key.
■ Each Non-Key Attribute must depend on the entire primary key.
○ 3.3.5. Third Normal Form (3NF)
■ Example 3-16: Demonstrating anomalies in the 2NF relation "Student"
from Figure 3-17
■ Example 3-17: Further normalizing the relations from Figure 3-17 into
3NF
■ While Second Normal Form (2NF) addresses partial dependencies, Third
Normal Form (3NF) tackles transitive dependencies, further refining the
relational database structure to reduce redundancy and minimize
anomalies.
■ A relation is considered to be in 3NF if it fulfills the following conditions:
■ 1. It is in 2NF: The relation satisfies the requirements of both 1NF
and 2NF.
■ 2. No Transitive Dependencies: No non-key attribute is transitively
dependent on the primary key.
■ Each Non-Key Attribute must depend on the key, the whole key, nothing
but the key!
○ 3.3.6. Boyce-Codd Normal Form (BCNF)
■ Example 3-18: Demonstrating anomalies in the 3NF relation
"Exam-Lecture" from Figure 3-18
■ Example 3-19: Further normalizing the relations from Figure 3-18 into
BCNF
■ A relation is considered to be in BCNF if it meets the following criteria:
■ 1. It is in 3NF: It satisfies the requirements for 1NF, 2NF, and 3NF.
■ 2. Every Determinant is a Candidate Key: Each attribute or
attribute combination that determines other attributes must be a
candidate key for the relation.
■ Each Attribute must depend on the key, the whole key, nothing but the
key!
○ 3.3.7. Fourth Normal Form (4NF)
■ Example 3-20: Transforming the 1NF relation "Exam" from Figure 3-15
into 2NF, 3NF, and BCNF, highlighting multi-valued dependencies and
anomalies in the resulting relation "Exam-Lecture-Student"
■ While BCNF seeks to eliminate redundancies and anomalies arising from
determinants that are not candidate keys, 4NF goes a step further by
tackling the challenges posed by multivalued dependencies.
■ Multivalued Dependencies and Anomalies
■ A multivalued dependency occurs when there is a
one-to-many relationship between two or more attributes
within a relation. This means that for a given value of one
attribute (A), there can be multiple corresponding values of
another attribute (B). It is denoted as:
■ The only kind of multivalues allowed in the table are the multi values
dependent on the key.
○ Exercise 5: Normalizing a non-normalized "Employee" relation step-by-step into
1NF, 2NF, 3NF, BCNF, and 4NF