Data Management
and
Database Design
INFO 6210
Week #4
Northeastern University
This Week
• Different types Dependencies
• Normalization
• Relational Algebra
• Intro to SQL
2
Exercise
• Requirement
• Basketball league has many divisions
• Division has many players
• Division has many teams
Design the model. Simple task, envision 3 entities
with sample attributes
3
Will below solution works?
4
Solution
5
Normalization
6
Normalization
• Normalization is a –
• Process of refining data model build by ER diagram
• Design technique that begins by examining Relationships between attributes
7
Normalization
• Normalization is a –
• Process of refining data model build by ER diagram
• Design technique that begins by examining Relationships between attributes
• Ensures minimum redundancy of data
• Easier for users to access data
• Easy to maintain
• Takes minimum space to store data
• Reduces anomalies
• Insert, Update and Delete should affect only desired ROW
8
ERD – ER Diagram
• What is ER Diagram (ERD)?
• Model and design relational databases, in terms of logic and business rules
• This is the first step that happens for database dependent applications
• Helps to troubleshoot existing problems
• Components of ERD –
• Entity – A definable object such as customer, student, product and shown as Rectangle
• Relationship – How entities connect each other and show as diamond
• Attribute – Characteristics of an Entity and show as oval or sometimes circle.
• Limitations –
• Only for Relational DBs (Structured)
• Cannot be applied on unstructured
9
ERD – ER Diagram
• Tools –
• Erwin
• Toad Data modeler
• SQL Data modeler
• Navicat data modeler
• E/R Studio
• Generating
• Design based on code is Reverse Engineering
• Code based on Design is Forward Engineering
10
Dependencies
• Functional
• Transitive
• Partial
11
Functional Dependencies
Important concept associated with Normalization is Functional Dependencies
• The attribute B is fully functionally dependent on the attribute A, if each value of A determines
one and only one value of B
• Example: Product_ID à Product_Name
• Product ID determines Product Name
• Product Name is Dependent Attribute
• Product ID is Determinant
• We can also say Product_ID functionally determines Product_Name
12
Functional Dependencies
• Another Example
13
Partial Dependency
• A functional dependency A à B is called partial dependency if –
• There is a attribute that can be removed from A and yet the dependency still holds
Partial Dependency is similar to functional dependency when non prime-key attribute(s) are functionally dependent
on a part of the Candidate key.
Student_ID Student_fName Student_lname Course_ID Course_name PK-Part1
S001 John Smith C8899 Artificial Intelligence PK-Part2
S002 Brad Smith C8899 Artificial Intelligence Primary key
Student ID + Course ID is unique – Primary key
Course Name is dependent on Course ID and doesn’t required Student ID
To resolve this, we create a parent table called course and reference to student.
14
Transitive Dependency
• An indirect relationship between data elements
Model Manufacturer Contact#
• A, B, and C are attributes of a relation such that if
• A à B and B à C, then C is transitively dependent on A via B Camry Toyota 9819876677
• If we know Model, we can get Manufacturer, Using Manufacturer Corella Toyota 9819876677
we can get phone number Accord Honda 8889997766
• Model –> contact is transitive dependent
Civic Honda 8889997766
Acura Honda 8889997766
• Problems if we have Transitive dependency –
Kia Hyundai 8001112222
• If we delete Kia and Sonata models, then Manufacturer and
his contact is removed Sonata Hyundai 8001112222
• Cannot add Manufacturer without a Model
• Adding additional attributes of Manufacture adds repetition
• Update leads more than one row modification
15
Normalization
1st Normal Form
2nd Normal Form
3rd Normal Form
16
Normalization
1st Normal Form
2nd Normal Form
3rd Normal Form
17
Boyce-Codd normal form (BCNF)
Fourth normal form (4NF)
Database Normalization by E.F CODD
Every Non-Prime attribute of relation R is –
non-Transitively dependent which means directly dependent on
every candidate key of R.
18
First Normal form – 1NF
• Steps –
• Every column should have only one value
• Combination of row and column should not have more than one value
• Identify repeating groups of fields
• Remove repeating groups to separate table
• Identify keys for the tables
• Goal is to have a Unique key
• Key of parent table is brought as part of concatenated key to the second table
19
First Normal form – 1NF
Company
ID Company Name City Company Contact
1 ABC Corp Denver Mike
2 True Tree Org Burbank Scott, Martin
3 ACME Inc. Frisco John, Drew, Sam, Helen
20
First Normal form – 1NF
Company
ID CompanyName City Company Contact
1 ABC Corp Denver Mike
2 True Tree Org Burbank Scott, Martin
3 ACME Inc. Frisco John, Drew, Sam, Helen
Company
ID Name City CContact1 CContact2 CContact3 CContact4
1 ABC Corp Denver Mike
2 True Tree Org Burbank Scott Martin
3 ACME Inc Frisco John Drew Sam Hellen
21
First Normal form – 1NF
Company
ID CompanyName City Company_Contact
1 ABC Corp Denver ID Company_ID Contact_Name
2 True Tree Org Burbank 1 1 Mike
3 ACME Inc. Frisco 2 2 Scott
3 2 Martin
4 3 Drew
5 3 Sam
6 3 Hellen
22
Raw Data
Assumption –
Client rents a given property
only once and cannot rent
more than one property at a
time
23
Second Normal form – 2NF
• Steps –
• Check if all fields are dependent on the whole key (Entire primary key)
• Remove fields that depend on part of the key
• Group partially dependent fields as a sequence table
• Name the tables
• Identify keys to the tables
24
Second Normal form – 2NF
Course_Enrollments
ID Date Course Name # of Students Hall No
DW987 1/1/1989 DW Systems 5 9
DW987 5/1/1989 DW Systems 5 7
DBMS433 8/1/1992 Data science 9 11
25
Second Normal form – 2NF
Course
Enrollments
ID Course Name
DW987 DW Systems ID CID Date # of Hall No
Students
DBMS433 Data science
1 DW987 1/1/1989 5 9
2 DW987 5/1/1989 5 7
3 DBMS433 8/1/1992 9 11
26
Third Normal form – 3NF
• Remove Fields that –
• Depend on other non-key fields (Transitive dependency)
• Can be calculated or derived
• Group interdependent fields as separate tables.
• Identify such tables and name them with appropriate key.
27
Third Normal form – 3NF
Course_Enrollments
ID CID Date Available Hall No Capacity
1 DW987 1/1/1989 5 9 12
2 DW987 5/1/1989 8 7 10
3 DBMS433 8/1/1992 3 11 15
28
Third Normal form – 3NF
Hall_Details
Hall No Capacity
9 12
7 10
11 15
Course_Enrollments
ID CID Date Available Hall No
1 DW987 1/1/1989 5 9
2 DW987 5/1/1989 8 7
3 DBMS433 8/1/1992 3 11
29
Checkpoint 1
Which of the following is Wrong?
A. An attribute of an entity can have more than one value
B. In a row of a relational table, an attribute can have more than one value
C. In a row of a relational table, an attribute can have exactly one value or a NULL value
30
Recap
• Properties of a relation?
• Has a name which is unique in its schema
• Combination of row and column should have only one
value
• Every attribute in a relation should have unique names
• Every row is unique
31
Recap
• Natural Key – Identifier used to uniquely identify data
• Super Key – 1 or more Column groups which can uniquely identify row. Eg: A tuple is always a super key
• Candidate Key – Least number of columns that can be used to uniquely identify a row (Subset of super key)
• Primary Key – Candidate key is eligible to become a primary key
• Functional Dependency – Typically between PK and non Key columns – Help to reduce redundancy
• Partial Dependency – There is an attribute that depends on part of primary key
• Transitive Dependency – A à B, B à C, then C is transitively dependent on A.
• First Normal form – No multivalued attribute, Entries in a column are of same type.
• Second Normal form – Should be in 1NF and No partial dependency
• Third Normal form – should be in 2NF and No Transitive dependency
32
Recap
• When you normalize a relation by breaking it into two smaller relations, what must you
do to maintain data integrity? (Select all that apply)
• A. Link the relations by a common field
• B. Remove any functional dependencies from both relations
• C. Assign both relations the same primary key field(s)
• D. Create a primary key(s) for the new relation
33
Introduction to SQL
• Language used to access Data
• Key features
ü Non Procedural language
ü Unified language
ü Common language for all Relational Databases
• DDL – Create objects like Tables, Views, Indexes
• DML – Modify data
• DCL – Controlling data and access
34
Datatypes
• CHAR
• VARCHAR (VARCHAR2)
• NVARCHAR2
• NUMBER
• DATE
• BLOB
• CLOB
• RAW
35
Syntax - SQL
SELECT [COLUMN NAMES]
FROM [TABLE NAMES]
WHERE [JOIN CONDITIONS]
GROUP BY [COLUMN NAMES]
ORDER BY [COLUMN NAMES]
36
What’s in next week?
• Relational Algebra
• SQL continues…
37
Relational Algebra
38
Relational Algebra
• It is a procedural query language –
• which takes instance of relation as input to yields instance of relation as output.
• It uses operators to perform queries
• So, Any operation that we perform on Relations produces another Relation
• Relation with Operator gives Relations
39
Relational Algebra
• It is a procedural query language –
• which takes instance of relation as input to yields instance of relation as output.
• It uses operators to perform queries
• So, Any operation that we perform on Relations produces another Relation
• Relation with Operator gives Relations
• What is Unary and binary operator(s)?
40
Unary and Binary Operators
• An operator that expects more than one input is Binary operator
• Example: Addition, AND, Subtraction, Division, Multiplication
• Unary
• Factorial (!)
• ++
• -- (2 minus symbols)
• Binary
• ==
• <
• >
• !=
41
Operand and Operators
• Operators – - Operator act on Operand
•+
•- Unary
•/ Operator Operand
• Operand can be –
• Numbers
• Words Binary
Operand
Operator
• Mathematical operations on numbers
Operand
• Substring, concatenation on strings
42
Relational Algebra
Relational Algebra is operational , Simple to use
and can be used to show execution plans
Basic Operations are –
- Unary operators - Binary operators
- Selection (σ) - Cross product (X)
- Projection (π) - Difference (–)
- Rename (ρ) - Union (U)
- Intersection (∩)
43
Questions?
44