Database Systems Notes
Database Systems Notes
SCS 1202
The levels of Data
Database One or more tables
Table A collection of Records
(relation)
Record A group of related fields
Field One or more character
Character At least 8 bits
Bit 0 or 1
(continued)
Data vs. Information
• Data:
– Raw facts; building blocks of information
• Information:
– Data processed to reveal meaning
• Accurate, relevant, and timely information is key
to good decision making
• Good decision making is the key to survival in a
global environment
Introducing the Database
and the DBMS
• Database—shared, integrated computer
structure that stores:
– End user data (raw facts)
– Metadata (data about data)
Introducing the Database and
the DBMS (continued)
• DBMS (database management system):
– Collection of programs that manages
database structure and controls access to
data
– Possible to share data among multiple
applications or users
– Makes data management more efficient and
effective
5 Characteristics of Good
Database
Data Integrity Ensuring data is valid
Data Independence Data is separated from software
Avoiding data Repetition of input data is avoided
Redundancy
Data Security Data is not accessible to
unauthorized users
Data Maintenance Set procedures for adding ,deleting
… records for the purpose of
optimization
Database: Historical Roots
• Manual File System
– to keep track of data
– used tagged file folders in a filing cabinet
– organized according to expected use
• e.g. file per customer
– easy to create, but hard to
• locate data
• aggregate/summarize data
DBMS Components
• Data independence
• Efficient data access
• Data integrity & security
• Data administration
• Concurrent access, crash recovery
• Reduced application development time
• So why not use them always?
– Expensive/complicated to set up & maintain
– This cost & complexity must be offset by need
– General-purpose, not suited for special-purpose tasks (e.g. text
search!)
(continued)
• End users have better access to more and
better-managed data
– Promotes integrated view of organization’s
operations
– Probability of data inconsistency is greatly
reduced
– Possible to produce quick answers to ad hoc
queries
Databases make these folks
happy ...
• DBMS vendors, programmers
– Oracle, IBM, MS, Sybase, …
• End users in many fields
– Business, education, science, …
• DB application programmers
– Build enterprise applications on top of DBMSs
– Build web services that run off DBMSs
• Database administrators (DBAs)
– Design logical/physical schemas
– Handle security and authorization
– Data availability, crash recovery
– Database tuning as needs evolve
Cont…
• DBMS used to maintain, query large datasets.
– can manipulate data and exploit semantics
• Other benefits include:
– recovery from system crashes,
– concurrent access,
– quick application development,
– data integrity and security.
• Levels of abstraction provide data
independence
Concurrency Control
• Concurrent execution of user programs: key to good DBMS
performance.
– Disk accesses frequent, pretty slow
– Keep the CPU working on several programs concurrently.
• Interleaving actions of different programs: trouble!
– e.g., account-transfer & print statement at same time
• DBMS ensures such problems don’t arise.
– Users/programmers can pretend they are using a single-user
system. (called “Isolation”)
– Thank goodness! Don’t have to program “very, very carefully”.
Types of Databases
• Single-user:
– Supports only one user at a time
• Desktop:
– Single-user database running on a personal
computer
• Multi-user:
– Supports multiple users at the same time
Types of Databases (continued)
• Workgroup:
– Multi-user database that supports a small
group of users or a single department
• Enterprise:
– Multi-user database that supports a large
group of users or an entire organization
Types of Databases (continued)
Can be classified by location:
• Centralized:
– Supports data located at a single site
• Distributed:
– Supports data distributed across several sites
Types of Databases (continued)
Can be classified by use:
• Transactional (or production):
– Supports a company’s day-to-day operations
• Data warehouse:
– Stores data used to generate information
required to make tactical or strategic decisions
– Often used to store historical data
– Structure is quite different
Web Database
• Internet is emerging as a prime business tool
– Shift away from models (e.g. relational vs. O-O)
– Emphasis on interfacing with the Internet
• Note: can specify simple integrity constraints on the data. The DBMS
enforces these.
– Beyond this, the DBMS does not understand the semantics of the
data.
– Ensuring that a single transaction (run alone) preserves
consistency is largely the user’s responsibility!
Goal: The ACID properties
• Atomicity: Either all actions are carried out, or
none are
• Consistency: If each transaction is consistent,
and the database is initially consistent, then it is
left consistent
• Isolation: Transactions are isolated, or
protected, from the effects of other scheduled
transactions
• Durability: If a transactions completes
successfully, then its effects persist
50
AAtomicity
• A transaction can
– Commit after completing its actions, or
– Abort because of
• Internal DBMS decision: restart
• System crash: power, disk failure, …
• Unexpected situation: unable to access disk, data value, …
• A transaction interrupted in the middle could leave the
database inconsistent
• DBMS needs to remove the effects of partial
transactions to ensure atomicity: either all a
transaction’s actions are performed or none
51
AAtomicity cont.
• A DBMS ensures atomicity by undoing
the actions of partial transactions
• To enable this, the DBMS maintains a
record, called a log, of all writes to the
database
• The component of a DBMS responsible for
this is called the recovery manager
52
Consistency
• Users are responsible for ensuring transaction
consistency
– when run to completion against a consistent database instance,
the transaction leaves the database consistent
• For example, consistency criterion that my inter-account-
transfer transaction does not change the total amount of
money in the accounts!
Integrity
• Database consistency is the property that everyConstraints!
transaction sees a consistent database instance. It
follows from transaction atomicity, isolation and
transaction consistency
53
Isolation
• Guarantee that even though transactions may
be interleaved, the net effect is identical to
executing the transactions serially
• For example, if transactions T1 and T2 are
executed concurrently, the net effect is
equivalent to executing
– T1 followed by T2, or
– T2 followed by T1
• NOTE: The DBMS provides no guarantee of
effective order of execution
54
Durability
• DBMS uses the log to ensure durability
• If the system crashed before the changes
made by a completed transaction are
written to disk, the log is used to
remember and restore these changes
when the system is restarted
• Again, this is handled by the recovery
manager
55
Example of transaction
59
Three-Schema Architecture and
Data Independence
• External Level
– Users’ view of the database
– Describes that part of database that is
relevant to a particular user
• Conceptual Level
– Community view of the database
– Describes what data is stored in database
and relationships among the data
60
Three-Schema Architecture and
Data Independence
• Internal Level
– Physical representation of the database on
the computer.
– Describes how the data is stored in the
database
61
Three-Schema Architecture and
Data Independence
• Data Independence is the capacity to change
the schema at one level of a database system
without having to change the schema at the
next higher level
• Logical Data Independence
– Refers to immunity of external schemas to changes
in conceptual schema
– Conceptual schema changes (e.g. addition/removal
of entities) should not require changes to external
schema or rewrites of application programs
62
Three-Schema Architecture and
Data Independence
• Physical Data Independence
– Refers to immunity of conceptual schema to
changes in the internal schema
– Internal schema changes (e.g. using different
file organizations, storage structures/devices)
should not require changes to conceptual or
external schemas
63
Three-Schema Architecture and
Data Independence
64
Database Model
File-based
Hierarchical
Object-
Network
oriented
Relational Web-based
Entity-Relationship
• Disadvantages
– Limited representation of data relationships
• did not allow Many-to-Many (M:N) relations
– Complex implementation
• required in-depth knowledge of physical data storage
– Structural Dependence
• data access requires physical storage path
– Lack of Standards
• limited portability
• Disadvantages
– System complexity
• require familiarity with the internal structure for data access
– Lack of structural independence
• small structural changes require significant program changes
Name
• Disadvantages
– Substantial hardware and system software overhead
• more complex system
– Poor design and implementation is made easy
• ease-of-use allows careless use of RDBMS
–Entities
–Relationships
–Attributes
97
How to find entities?
• Entity:
– "...anything (people, places, objects, events, etc.)
about which we store information (e.g. supplier,
machine tool, employee, utility pole, airline seat,
etc.).”
– Tangible: customer, product
– Intangible: order, accounting receivable
– Look for singular nouns (beginner)
– BUT a proper noun is not a good candidate….
98
Entities
• Entity - distinguishable “thing” in the real world
– Strong (or regular) entity - entities have an
independent existence (e.g. staff)
– Weak entity - existence dependent on some other
entity (e.g. next of kin)
• Attribute:
– Attributes are data objects that either identify or
describe entities (property of an entity).
– In other words, it is a descriptor whose values are
associated with individual entities of a specific entity
type
• The process for identifying attributes is similar except now
you want to look for and extract those names that appear
to be descriptive noun phrases.
101
Attributes
“Describe detail information about an entity ”
• Entity: Employee
• Attributes:
– Employee-Name
– Address (composite)
– Phone Extension
– Date-Of-Hire
– Job-Skill-Code
– Salary
102
Classes of attributes
• Simple attribute
• Composite attribute
• Derived attributes
• Single-valued attribute
• Multi-valued attribute
103
Simple/Composite attribute
• A simple attribute cannot be subdivided.
– Examples: Age, Gender, and Marital status
• A composite attribute can be further
subdivided to yield additional attributes.
– Examples:
• ADDRESS -- Street, City, State, Zip
• PHONE NUMBER -- Area code, Exchange number
104
Derived attribute
• is not physically stored within the database
• instead, it is derived by using an algorithm.
– Example 1: Late Charge of 2%
• MS Access: InvoiceAmt * 0.02
– Example 2: AGE can be derived from the date of
birth and the current date.
• MS Access: int(Date() – Emp_Dob)/365)
105
Single-valued attribute
• can have only a single (atomic) value.
– Examples:
• A person can have only one social security number.
• A manufactured part can have only one serial number.
– A single-valued attribute is not necessarily a
simple attribute.
• Part No: CA-08-02-189935
• Location: CA, Factory#:08, shift#: 02, part#: 189935
106
Multi-valued attributes
• can have many values.
– Examples:
• A person may have several college degrees.
• A household may have several phones with
different numbers
• A car color
107
Example - “Movie Database”
• Entity:
– Movie Star
• Attributes:
– SS#: “123-45-6789” (single-valued)
– Cell Phone: “(661)123-4567, (661)234-5678”
(multi-valued)
– Name: “Harrison Ford” (composite)
– Address: “123 Main Str., LA, CA” (composite)
– Gender: “Female” (simple)
– Age: 24 (derived)
108
How to find relationships?
• Relationship:
– Relationships are associations between entities.
– Typically, a relationship is indicated by a verb
connecting two or more entities.
– Employees are assigned to projects
– Relationships should be classified in terms of
cardinality.
• One-to-one, one-to-many, etc.
109
Relationships: constraints
• The degree of a relationship type
– binary (connects 2 entity types)
– unary/ recursive (connects 1 entity type with itself) Degree
– complex (connects 3 or more entity types)
• Ternary (connects 3)
110
Relationships: Degree
Entity1 Entity2
HasLinkWith
Binary relationship
S u p e rv is o r Supe rv ise s
Entity1
Staff Recursive (Unary) relationship
- example
S u p e rv is e e
Entity1 Entity3
Te rnaryRe lationship
Complex relationship
–
here ternary Entity2
111
How to find cardinalities?
• Cardinality:
– The cardinality is the number of occurrences in one
entity which are associated to the number of
occurrences in another.
– There are three basic cardinalities (degrees of
relationship).
– one-to-one (1:1), one-to-many (1:M), and many-to-
many (M:N)
112
Basic Cardinality Type
• 1-to-1 relationship
• 1-to-M relationship
• M-to-N relationship
Cardinality con’t
Identifier
“attributes that uniquely identify entity instances”
• Becomes a PK in RDS
• Composite identifiers are identifiers that consist
of two or more attributes
• Identifiers are represented by underlying the
name of the attribute(s)
– Employee (Employee_ID), student (Student_ID)
115
Crow’s Foot Notation
116
Attributes
117
Entity Relationship diagram
(ERD)
Shows in a diagram how entities
(tables) are related to one another.
One to One
One to many
Many to many
One to one
Extension of number of attributes in a
single table
Rarely required
Requires
two Family Genus Species
tables.
Linked by
Foreign
Key
Many to many
Need to break
down to one to Measurement Code
many
Requires three Tree Code
tables
Measurement
Associative table
provides common
key
E-R Diagram: Crow’s Foot Model
• Entity
– represented by a rectangle with
its name in capital letters.
• Relationships
– represented by an active or
passive verb that connects the
related entities.
• Connectivities
– indicated by symbols next to
entities.
• 2 vertical lines for 1
• “crow’s foot” for M
123
E-R Diagram: Chen’s Model
• Peter Chen’s Landmark Paper in 1976
– “The Relationship Model: Toward a Unified View of Data”
– Graphical representation of entities and their relationships
• Relationships
– represented by an active or passive
verb inside the diamond that connects
the related entities.
• Connectivities
– i.e., types of relationship
– written next to each entity box.
• Disadvantages
• Disadvantages
– Lack of standards
• no standard data access method
– Complex navigational data access
• class hierarchy traversal
– Steep learning curve
• difficult to design and implement properly
– More system-oriented than user-centered
– High system overhead
• slow transactions
S511 Session 2, IU-SLIS 131
NORMALIZATION
Goals of Normalisation
• Eliminate certain kinds of redundancy
• avoid certain update anomalies
• good reresentation of real world
• simplify enforcement of DB integrity
Update anomalies
• Undesirable side-effects that occur
when performaing insertion,
modification or deletion operations on
badly designed relational DBs.
SSN Name Dept DeptMgr Dept Name
987 J Smith 1 321 Representing
…
654 M Burke 2 467 Department info
...
333 A Dolan 1 321 in the Employee
321 K Doyle 1 321 table causes
678 O O’Neill 3 678 problems.
467 R McKay 2 467
Sample anomalies
• Modification -
– when the manager of a dept changes we
have to change many values.
– If we are not careful the DB will contain
inconsistencies.
– There is no easy way to get the DB to ensure
that a department has only one manager and
only one name.
Anomalies continued
• Deletion -
– if O O’Neill leaves we delete his tuple and
lose
• the fact that there is a department 3
• the name of dept 3
• who is the manager of dept. 3
• Insertion
– how would we create a new department
before any employees are assigned to it ?
Better design
• Separate entities are represented in
separate tables.
SSN Name Dept Dept DeptMgr Dept Name
987 J Smith 1 1 321 …
654 M Burke 2 2 467
...
333 A Dolan 1 3 678
321 K Doyle 1
678 O O’Neill 3
467 R McKay 2
Number of Tables
Redundancy
– Third Normal Form (3NF)
Complexity
– Boyce-Codd Normal Form (BCNF)
– Fourth Normal Form (4NF)
– Fifth Normal Form (5NF)
– Domain Key Normal Form (DKNF)
Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFor
orBCNF
BCNFin inorder
orderto
toavoid
avoid
the
thedatabase
databaseanomalies.
anomalies.
Levels of Normalization
1NF
2NF
3NF
4NF
5NF
DKNF
Each
Eachhigher
higherlevel
levelisisaasubset
subsetof
ofthe
thelower
lowerlevel
level
First Normal Form
First Normal Form
We say a relation is in 1NF if all values stored in the
relation are single-valued and atomic.
141
91.2914
First Normal Form
The following is not in 1NF
142
91.2914
First Normal Form
EmpNum EmpPhone EmpDegrees
123 233-9876
333 233-1231 BA, BSc, PhD
679 233-1231 BSc, MSc
143
91.2914
First Normal Form
EmployeeDegree
Employee
EmpNum EmpDegree
EmpNum EmpPhone
333 BA
123 233-9876
333 BSc
333 233-1231
333 PhD
679 233-1231
679 BSc
679 MSc
144
91.2914
First Normal Form (1NF)
This table is Not in 1NF, Explain why and then
0-55-123456-9 Main Street Small House 714-000-0000 $22.95 0-55-123456-9 Jones 123-333-3333
147
91.2914
Functional Dependencies
Functional Dependencies
We say an attribute, B, has a functional dependency on
another attribute, A, if for any two records, which have
the same value for A, then the values for B in these two
records must be the same. We illustrate this as:
AB
Example: Suppose we keep track of employee email
addresses, and we only track one email address for each
employee. Suppose each employee is identified by their
unique employee number. We say there is a functional
dependency of email address on employee number:
149
91.2914
Functional Dependencies
EmpNum EmpEmail
EmpNum EmpFname 3 different ways
EmpNum EmpLname you might see FDs
depicted
EmpEmail
EmpNum EmpFname
EmpLname
150
91.2914
Partial dependency
A partial dependency exists when an attribute B is
functionally dependent on an attribute A, and A is a
component of a multipart candidate key.
156
91.2914
Third Normal Form (3NF)
This form dictates that all non-key attributes
of a table must be functionally dependent
on a candidate key i.e. there can be no
interdependencies among non-key
attributes.
158
91.2914
Third Normal Form (3NF)
Example (Not in 3NF)
Scheme {Title, PubID, PageCount, Price }
1. Key {Title, PubId}
2. {Title, PubId} {PageCount}
3. {PageCount} {Price}
4. Both Price and PageCount depend on a key hence 2NF
5. Transitively {Title, PubID} {Price} hence not in 3NF
3NF - Decomposition
1. Move all items involved in transitive dependencies to a
new entity.
2. Identify a primary key for the new entity.
3. Place the primary key for the new entity as a foreign
key on the original entity.
Contractor Contractor
Example 3 (Convert to 3NF) BuildingID Fee
Slide 6- 163
Select operation
• Unary operation
• Select Operation: Uses some criteria to
select some tuples from the original
relation eg UNIT = 5 (COURSES)
Project operation
• Unary operation
• Project Operation: Creates relation
in which each tuple has fewer
attributes eg No,(COURSES)
Join operation
• Binary operation
• Join Operation: Takes two relation and combine
them based on common attribute eg COURSES
• TAUGHT-BY
Union operation
• Binary operation
• Union Operation: Creates new relation in which
each tuple is either in the first relation, the second
relation or in both eg CIS15-Roster CIS52-Roster
Intersection operation
• Binary operation
• Intersection Operation: Creates new relation
in which each tuple is either in both
relations.
• Eg CIS15-Roster CIS52-Roster
Difference operation
• Binary Operation
• Difference Operation: Creates new relation where
the new tuples are in the first relation but not in
the second. Eg CI15-Roster CIS52-Roster
STRUCTURED
QUERY
LANGUAGE
SQL
update RELATION-NAME
set attribute1 = value1 attribute 2 = value2 …
where criteria
Update (Example)
update COURSES
set unit = 6
where No = “CIS51”
4. Select
• SQL Select Operation format
select *
from RELATION-NAME
where criteria
Select (Example)
select *
from COURSES
where Unit = 5
5. Project
• SQL Project Operation format
select attribute-list
from RELATION-NAME
Project (Example)
select attribute-list
from RELATION NO1, RELATION NO2
where criteria
Join (Example)
select *
from RELATION NO1
union
select *
from RELATION NO2
Union (Example)
select *
from CIS15-Roster
union
select *
from CIS52-Roster;
8. Intersection
• SQL Intersection Operation format
select *
from RELATION NO1
intersection
select *
from RELATION NO2
Intersection (Example)
select *
from CIS15-Roster
intersection
select *
from CIS52-Roster;
9. Difference
• SQL Difference Operation format
select *
from RELATION NO1
minus
select *
from RELATION NO2
Intersection (Example)
select *
from CIS15-Roster
minus
select *
from CIS52-Roster;