0% found this document useful (0 votes)
4 views188 pages

Database Systems Notes

The document shows notes on database systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views188 pages

Database Systems Notes

The document shows notes on database systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 188

Database Systems

SCS 1202
The levels of Data
Database One or more tables
Table A collection of Records
(relation)
Record A group of related fields
Field One or more character
Character At least 8 bits
Bit 0 or 1
(continued)
Data vs. Information
• Data:
– Raw facts; building blocks of information
• Information:
– Data processed to reveal meaning
• Accurate, relevant, and timely information is key
to good decision making
• Good decision making is the key to survival in a
global environment
Introducing the Database
and the DBMS
• Database—shared, integrated computer
structure that stores:
– End user data (raw facts)
– Metadata (data about data)
Introducing the Database and
the DBMS (continued)
• DBMS (database management system):
– Collection of programs that manages
database structure and controls access to
data
– Possible to share data among multiple
applications or users
– Makes data management more efficient and
effective
5 Characteristics of Good
Database
Data Integrity Ensuring data is valid
Data Independence Data is separated from software
Avoiding data Repetition of input data is avoided
Redundancy
Data Security Data is not accessible to
unauthorized users
Data Maintenance Set procedures for adding ,deleting
… records for the purpose of
optimization
Database: Historical Roots
• Manual File System
– to keep track of data
– used tagged file folders in a filing cabinet
– organized according to expected use
• e.g. file per customer
– easy to create, but hard to
• locate data
• aggregate/summarize data

• Computerized File System


– to accommodate the data growth and information need
– manual file system structures were duplicated in the computer
– Data Processing (DP) specialists wrote customized programs to
• write, delete, update data (i.e. management)
• extract and present data in various formats (i.e. report)

S511 Session 2, IU-SLIS 8


Historical Roots: Files and File
Systems
• Managing data with file systems is
obsolete
– Understanding file system characteristics
makes database design easier to understand
– Awareness of problems with file systems
helps prevent similar problems in DBMS
– Knowledge of file systems is helpful if you
plan to convert an obsolete file system to a
DBMS
Historical Roots: Files and File
Systems (continued)
Manual File systems:
• Collection of file folders kept in file cabinet
• Organization within folders based on data’s
expected use (ideally logically related)
• System adequate for small amounts of data with
few reporting requirements
• Finding and using data in growing collections of
file folders became time-consuming and
cumbersome
Historical Roots: Files and File
Systems (continued)
Conversion from manual to computer
system:
• Could be technically complex, requiring
hiring of data processing (DP) specialists
• Resulted in numerous “home-grown”
systems being created
• Initially, computer files were similar in
design to manual files
Historical Roots: Files and File
Systems (continued)
Historical Roots: Files and File
Systems (continued)
• DP specialist wrote programs for reports:
– Monthly summaries of types and amounts of
insurance sold by agents
– Monthly reports about which customers
should be contacted for renewal
– Reports that analyzed ratios of insurance
types sold by agent
– Customer contact letters summarizing
coverage
Historical Roots: Files and File
Systems (continued)
• Other departments requested databases
be written for them
– SALES database created for sales
department
– AGENT database created for personnel
department
Historical Roots: Files and File
Systems (continued)
• As number of databases increased, small
file system evolved
• Each file used its own application
programs
• Each file was owned by individual or
department who commissioned its creation
Historical Roots: Files and File
Systems (continued)
Example of Early Database
Design (continued)
• As system grew, demand for DP’s
programming skills grew
• Additional programmers hired
• DP specialist evolved into DP manager,
supervising a DP department
• Primary activity of department (and DP
manager) remained programming
Problems with File System
Data Management
• Every task requires extensive programming in a
third-generation language (3GL)
– Programmer must specify task and how it must be
done
• Modern databases use fourth-generation
languages (4GL)
– Allow users to specify what must be done without
specifying how it is to be done
• Example: DO Loop VS. Select Statement
Problems with File System
Data Management
• Time-consuming, high-level activity
• As number of files expands, system
administration becomes difficult
• Making changes in existing file structure is
difficult
• File structure changes require
modifications in all programs that use data
in that file
Problems with File System
Data Management
• Modifications are likely to produce errors,
requiring additional time to “debug” the
program
• Security features hard to program and
therefore often omitted
Structural and Data
Dependence
• Structural dependence
– Access to a file depends on its structure
• Data dependence
– Changes in the data storage characteristics
without affecting the application program’s
ability to access the data
– Logical data format
• How the human being views the data
– Physical data format
• How the computer “sees” the data
Field Definitions and Naming
Conventions
• Flexible record definition anticipates
reporting requirements by breaking up
fields into their component parts
• Example:
Cutomer Last Name …. Cus-LName
Data Redundancy
• Data redundancy results in data inconsistency
– Different and conflicting versions of the same
data appear in different places
• Errors more likely to occur when complex entries
are made in several different files and/or recur
frequently in one or more files
• Data anomalies develop when required changes
in redundant data are not made successfully
Data Redundancy

Types of data anomalies:


• Update anomalies
– Occur when changes must be made to
existing records
• Insertion anomalies
– Occur when entering new records
• Deletion anomalies
– Occur when deleting records
File System: Problem Case
CUSTOMER file AGENT file SALES file

A_Name (15 char) A_Name (20 char) AGENT (20 char)

Carol Johnson Carol T. Johnson Carol J. Smith

- inconsistent field name, field


size
- inconsistent data values
- data duplication
S511 Session 2, IU-SLIS 25
Database Systems
• Problems inherent in file systems make
using a database system desirable
• File system
– Many separate and unrelated files
• Database
– Logically related data stored in a single logical
data repository
Database Systems
The Database System
Environment
• Database system is composed of five
main parts:
– Hardware
– Software
• Operating system software
• DBMS software
• Application programs and utility software
– People
– Procedures
– Data
Figure 14-1

DBMS Components

• Hardware: the physical computer system


• Software: the program that allows users to access,
maintain update the data
• Data: Data are stored on the computer. In database data are
separate entity from the software that access them
• Users: could be either (1) end users: people who has
access to the database or (2) application program:
applications that access and process data
• Procedures: rules that are defined and followed by the
users
The Database System Environment
(continued)
DBMS Functions
• DBMS performs functions that guarantee
integrity and consistency of data
– Data dictionary management
• defines data elements and their relationships
– Data storage management
• stores data and related data entry forms, report
definitions, etc.
Database Management System
- manages interaction between end users and
database

Database Systems: Design, Implementation, & Management: Rob & Coronel

S511 Session 2, IU-SLIS 32


DBMS Functions (continued)
– Data transformation and presentation
• translates logical requests into commands to
physically locate and retrieve the requested data
– Security management
• enforces user security and data privacy within
database
DBMS Functions (continued)
– Multiuser access control
• uses sophisticated algorithms to ensure multiple
users can access the database concurrently
without compromising the integrity of the database
– Backup and recovery management
• provides backup and data recovery procedures
– Data integrity management
• promotes and enforces integrity rules
DBMS Functions (continued)
– Database access languages and application
programming interfaces
• provide data access through a query language

– Database communication interfaces


• allow database to accept end-user requests via
multiple, different network environments
DBMS Functions (continued)
Advantages of a DBMS

• Data independence
• Efficient data access
• Data integrity & security
• Data administration
• Concurrent access, crash recovery
• Reduced application development time
• So why not use them always?
– Expensive/complicated to set up & maintain
– This cost & complexity must be offset by need
– General-purpose, not suited for special-purpose tasks (e.g. text
search!)
(continued)
• End users have better access to more and
better-managed data
– Promotes integrated view of organization’s
operations
– Probability of data inconsistency is greatly
reduced
– Possible to produce quick answers to ad hoc
queries
Databases make these folks
happy ...
• DBMS vendors, programmers
– Oracle, IBM, MS, Sybase, …
• End users in many fields
– Business, education, science, …
• DB application programmers
– Build enterprise applications on top of DBMSs
– Build web services that run off DBMSs
• Database administrators (DBAs)
– Design logical/physical schemas
– Handle security and authorization
– Data availability, crash recovery
– Database tuning as needs evolve
Cont…
• DBMS used to maintain, query large datasets.
– can manipulate data and exploit semantics
• Other benefits include:
– recovery from system crashes,
– concurrent access,
– quick application development,
– data integrity and security.
• Levels of abstraction provide data
independence
Concurrency Control
• Concurrent execution of user programs: key to good DBMS
performance.
– Disk accesses frequent, pretty slow
– Keep the CPU working on several programs concurrently.
• Interleaving actions of different programs: trouble!
– e.g., account-transfer & print statement at same time
• DBMS ensures such problems don’t arise.
– Users/programmers can pretend they are using a single-user
system. (called “Isolation”)
– Thank goodness! Don’t have to program “very, very carefully”.
Types of Databases
• Single-user:
– Supports only one user at a time
• Desktop:
– Single-user database running on a personal
computer
• Multi-user:
– Supports multiple users at the same time
Types of Databases (continued)
• Workgroup:
– Multi-user database that supports a small
group of users or a single department
• Enterprise:
– Multi-user database that supports a large
group of users or an entire organization
Types of Databases (continued)
Can be classified by location:
• Centralized:
– Supports data located at a single site
• Distributed:
– Supports data distributed across several sites
Types of Databases (continued)
Can be classified by use:
• Transactional (or production):
– Supports a company’s day-to-day operations
• Data warehouse:
– Stores data used to generate information
required to make tactical or strategic decisions
– Often used to store historical data
– Structure is quite different
Web Database
• Internet is emerging as a prime business tool
– Shift away from models (e.g. relational vs. O-O)
– Emphasis on interfacing with the Internet

• Characteristics of “Internet age” databases


– Flexible, efficient, and secure Internet access
– Support for complex data types & relationships
– Seamless interfaces with multiple data sources and structures
– Ease of use for end-user, database architect, and database
administrator
• Simplicity of conceptual database model
• Many database design, implementation, and application development tools
• Powerful DBMS GUI

S511 Session 2, IU-SLIS 46


Other databases you may use
What Is a Transaction
• A transaction is a series of operations that are
performed as one logical unit of work.
• Transactions allow SQL Server to ensure a
certain level of data integrity and data
recoverability.
• The transaction log, which every database must
have, keeps a record of all transactions that
make any type of modification (insert, update, or
delete) to the database. SQL Server uses this
transaction log to recover data in case of errors
or system failures.
Transactions: ACID Properties
• Key concept is a transaction: a sequence of database actions
(reads/writes).

• DBMS ensures atomicity (all-or-nothing property) even if system


crashes in the middle of a Xact.
• Each transaction, executed completely, must take the DB between
consistent states or must not run at all.
• DBMS ensures that concurrent transactions appear to run in isolation.
• DBMS ensures durability of committed Xacts even if system crashes.

• Note: can specify simple integrity constraints on the data. The DBMS
enforces these.
– Beyond this, the DBMS does not understand the semantics of the
data.
– Ensuring that a single transaction (run alone) preserves
consistency is largely the user’s responsibility!
Goal: The ACID properties
• Atomicity: Either all actions are carried out, or
none are
• Consistency: If each transaction is consistent,
and the database is initially consistent, then it is
left consistent
• Isolation: Transactions are isolated, or
protected, from the effects of other scheduled
transactions
• Durability: If a transactions completes
successfully, then its effects persist
50
AAtomicity
• A transaction can
– Commit after completing its actions, or
– Abort because of
• Internal DBMS decision: restart
• System crash: power, disk failure, …
• Unexpected situation: unable to access disk, data value, …
• A transaction interrupted in the middle could leave the
database inconsistent
• DBMS needs to remove the effects of partial
transactions to ensure atomicity: either all a
transaction’s actions are performed or none

51
AAtomicity cont.
• A DBMS ensures atomicity by undoing
the actions of partial transactions
• To enable this, the DBMS maintains a
record, called a log, of all writes to the
database
• The component of a DBMS responsible for
this is called the recovery manager

52
Consistency
• Users are responsible for ensuring transaction
consistency
– when run to completion against a consistent database instance,
the transaction leaves the database consistent
• For example, consistency criterion that my inter-account-
transfer transaction does not change the total amount of
money in the accounts!
Integrity
• Database consistency is the property that everyConstraints!
transaction sees a consistent database instance. It
follows from transaction atomicity, isolation and
transaction consistency

53
Isolation
• Guarantee that even though transactions may
be interleaved, the net effect is identical to
executing the transactions serially
• For example, if transactions T1 and T2 are
executed concurrently, the net effect is
equivalent to executing
– T1 followed by T2, or
– T2 followed by T1
• NOTE: The DBMS provides no guarantee of
effective order of execution
54
Durability
• DBMS uses the log to ensure durability
• If the system crashed before the changes
made by a completed transaction are
written to disk, the log is used to
remember and restore these changes
when the system is restarted
• Again, this is handled by the recovery
manager
55
Example of transaction

• Transfer $50 from account Atomicity - shouldn’t take


A to account B money from A without giving
it to B
Read(A)
Consistency - money isn’t lost
A = A - 50 or gained
Write(A) Isolation - other queries
transaction
Read(B) shouldn’t see A or B change
B = B+50 until completion
Write(B) Durability - the money does
not go back to A
Three-Schema Architecture and
Data Independence
• Objectives of Three-Schema Architecture
– All users should be able to access same data
– Users should not need to know physical
database storage details
– DBA should be able to change database storage
structures without affecting the users’ views
– Internal structure of database should be
unaffected by changes to physical aspects of
storage
– DBA should be able to change conceptual
structure of database without affecting all users
58
Three-Schema Architecture and
Data
 Three-level Independence
architecture and data independence

59
Three-Schema Architecture and
Data Independence
• External Level
– Users’ view of the database
– Describes that part of database that is
relevant to a particular user
• Conceptual Level
– Community view of the database
– Describes what data is stored in database
and relationships among the data

60
Three-Schema Architecture and
Data Independence
• Internal Level
– Physical representation of the database on
the computer.
– Describes how the data is stored in the
database

61
Three-Schema Architecture and
Data Independence
• Data Independence is the capacity to change
the schema at one level of a database system
without having to change the schema at the
next higher level
• Logical Data Independence
– Refers to immunity of external schemas to changes
in conceptual schema
– Conceptual schema changes (e.g. addition/removal
of entities) should not require changes to external
schema or rewrites of application programs

62
Three-Schema Architecture and
Data Independence
• Physical Data Independence
– Refers to immunity of conceptual schema to
changes in the internal schema
– Internal schema changes (e.g. using different
file organizations, storage structures/devices)
should not require changes to conceptual or
external schemas

63
Three-Schema Architecture and
Data Independence

64
Database Model

• Database model defines the logical design


of data.
• Database model describes the relation
between different parts of data.
• There are three database models:
1. Hierarchical Model
2. Network Model
3. Relational Model
Evolution of Data Models
• Timeline

1960s 1970s 1980s 1990s 2000+

File-based

Hierarchical
Object-
Network
oriented
Relational Web-based
Entity-Relationship

S511 Session 2, IU-SLIS 66


Hierarchical model

• Data are organized in an upside down tree


• Each entity has one parent and many
children
• Old and not used now
Hierarchical Database
• Background
– Developed to manage large amount of data for complex manufacturing
projects
– e.g., Information Management System (IMS)
• IBM-Rockwell joint venture
• clustered related data together
• hierarchically associated data clusters using pointers

• Hierarchical Database Model


– Assumes data relationships are hierarchical
• One-to-Many (1:M) relationships
– Each parent can have many children
– Each child has only one parent
– Logically represented by an upside down tree

S511 Session 2, IU-SLIS 68


Hierarchical Database: Example

Database Systems: Design, Implementation, & Management: Rob & Coronel

S511 Session 2, IU-SLIS 69


Hierarchical Database: Pros &
Cons
• Advantages
– Conceptual simplicity
• groups of data could be related to each other
• related data could be viewed together
– Centralization of data
• reduced redundancy and promoted consistency

• Disadvantages
– Limited representation of data relationships
• did not allow Many-to-Many (M:N) relations
– Complex implementation
• required in-depth knowledge of physical data storage
– Structural Dependence
• data access requires physical storage path
– Lack of Standards
• limited portability

S511 Session 2, IU-SLIS 70


Network model

• Entities are organized in a graph


• Entities can be accessed through several
paths
• Old and not used
Network Database
• Objectives
– Represent more complex data relationships
– Improve database performance
– Impose a database standard

• Network Database Model


– Similar to Hierarchical Model
• Records linked by pointers
– Composed of sets
• Each set consists of owner (parent) and member (child)
– Many-to-Many (M:N) relationships representation
• Each owner can have multiple members (1:M)
• A member may have several owners

S511 Session 2, IU-SLIS 72


Network Database: Example

Database Systems: Design, Implementation, & Management: Rob & Coronel

S511 Session 2, IU-SLIS 73


Network Database: Pros & Cons
• Advantages
– More data relationship types
– More efficient and flexible data access
• “network” vs. “tree” path traversal
– Conformance to standards
• enhanced database administration and portability

• Disadvantages
– System complexity
• require familiarity with the internal structure for data access
– Lack of structural independence
• small structural changes require significant program changes

S511 Session 2, IU-SLIS 74


Relational Model
• In the relational data model the database is
represented as a group of related tables.
• The relational data model was introduced in 1970 by
E. F. Codd of IBM published a paper in CACM
entitled "A Relational Model of Data for Large Shared
Data Banks".
• It is currently the most popular model. The
mathematical simplicity and ease of visualization of
the relational data model have contributed to its
success.
Relational model

• Data are organized in two dimensional tables


(relations)
• Tables re related to each other
• Relational Database Management System (RDBMS)
are more common model used today
Relation (Name, Attributes, Tuples)
• Attributes are the
column heading
• Each column must have • Tuple is a
a unique heading collection of
• Number of columns is attribute value
called the degree of the • Total number of
relation rows is called
Cardinality of the
relation
• Each relation must
have a unique name

Name

• Relation appears in 2 dimensional table


• That doesn’t mean data stored as table; the
physical storage of data is independent of the
logical organization of data
What is a Relation?

•A table. Columns are fields (attributes)


of data related to other fields on the
•same row (tuple).
Definitions of Terminology
Formal relational term Informal equivalents
relation table
tuple row or record
cardinality number of rows
attribute column or field
degree number of columns
(unique) identifier Primary key
domain pool of legal values
Characteristics of a Relation
(table)
Two-dimensional structure with rows and columns
A relation represent a single entity
Each table must have an attribute to uniquely
identify each row
Column values all have same data type
Order of the rows and columns is immaterial to the
DBMS
Properties of a Relation

• Based on the set theory


1. There are no duplicate tuples (rows).
– The body of the relation is a mathematical set
(i.e., a set of tuples), and sets in mathematics by
definition do not include duplicate elements.
– If a "relation" contains duplicate tuples, then it is
not a relation.
Properties of a Relation

2. Tuples (rows) are unordered (top to bottom).


– Sets in mathematics are not ordered. So, even if
a relation A's tuples are reversely ordered, it is
still the same relation.
– Thus, there is no such thing as "the 5th tuple" or
the last tuple. In other words, there is no
concept of positional addressing.
Properties of a Relation

3. Attributes (columns) are unordered (left to


right).
– The heading of a relation is also defined as a set.
– There is no such thing as "5th attribute (column)"
or the last attribute.
Properties of a Relation
4. All attribute values are atomic.
– At every row-and-column position within the table,
there always exists precisely one value, never a
list of values. Or equivalently, relations do not
contain repeating groups.
– A relation satisfying this condition is said to be in
First Normal Form.
Primary Key

Identifies the row of a table without
duplicates.

Tells you what the row contains

Eg. If treeid is the primary key then the row
has information about that tree
Primary Key

• A PK is an attribute, or collection of attributes, whose


values uniquely identify each tuple in a relation.
• To being unique, a PK must be minimal (contain no
unnecessary attributes)
• and must not change in value.
Primary Key (con’t)

• One attribute, or collection of attributes, that can


serve as a PK is called candidate key,
• And the remaining keys that cannot be used as a
PK are called alternate key.
Cost of PK
SS# vs. finger print
Candidate Primary Key

Any attribute(s) which together would
serve as the primary key.

Must uniquely identify a row of data.

Each part of the key must be essential
to unique identification. No
redundancy.
Candidate Key and
Alternate key
State
StateAbbrev StateName StateNumber StateBird StatePopulation
CT Connecticut 5 American Robin 3,287,116.00
MI Michigan 26 Robin 9,295,297.00
SD South Dakota 40 Phesant 696,004.00
TN Tennessee 16 Mockingbird 4,877,185.00
TX Texas 28 Mockingbird 16,986,510.00
• Could any attribute (column) serve as the PK?
– candidate key
• Is there any attribute that should not be served as the PK?
– alternate key
Foreign Key

A foreign key is a column in a table
that matches the primary key column
of another table. Its function is to link
the basic data of two entities on
demand, i.e. when two tables are
joined using the common key.
Entity Integrity Rule

• Guarantees that each entity will have a unique


identity and ensures that foreign key values can
properly reference primary key values.
• Requirement
– No component of the primary key is allowed to accept
nulls.
– By "null" here, we mean that information is missing for
some reason.
Referential Integrity Rule

• The database must not contain any unmatched


foreign key values.
• Just as primary key values represent entity
identifiers, so foreign key values represent entity
references.
• The referential integrity rule simply says that if B
references A, then A must exist.
Relational Database
• Problems with legacy database systems
– Required excessive effort to maintain
• Data manipulation (programs) too dependent on physical file structure
– Hard to manipulate by end-users
• No capacity for ad-hoc query (must rely on DB programmers).

• Evolution in Data Organization

– E. F. Codd’s Relational Model proposal


• Separated the notion of physical representation (machine-view)
from logical representation (human-view)
• Considered ingenious but computationally impractical in 1970

– Relational Database Model


• Dominant database model of today
• Eliminated pointers and used tables to represent data
• Tables
– flexible logical structure for data representation
– a series of row/column intersections
– related by sharing common entity characteristic(s)

S511 Session 2, IU-SLIS 93


Relational Database: Example
 Provides a logical “human-level” view of the data and
associations among groups of data (i.e., tables)

Customer_ID Customer_Account Agent_ID


1224 4556 23
1225 4558 25

Agent_ID Last_Name First_Name Phone


23 Sturm David 334-5678
25 Long Kyle 556-3421

Customer_ID Last_Name First_Name Phone Account_Balance


1224 Vira Dyne 678-9987 1223.95
1225 Davies Tricia 556-3342 234.25

S511 Session 2, IU-SLIS 94


Relational Database: Pros & Cons
• Advantages
– Structural independence
• Separation of database design and physical data storage/access
• Easier database design, implementation, management, and use
– Ad hoc query capability with Structured Query Language (SQL)
• SQL translates user queries to codes

• Disadvantages
– Substantial hardware and system software overhead
• more complex system
– Poor design and implementation is made easy
• ease-of-use allows careless use of RDBMS

S511 Session 2, IU-SLIS 95


Entity Relationship Model

• The major activity of this phase is identifying


entities, attributes, and their relationships to
construct model using the Entity Relationship
Diagram.
– Entity  table
– Attribute  column
– Relationship  line
• Basics of Data Modeling Video
– Until business rules # 3 (9:20)
96
Skills and concepts
• So the concepts we want you to learn
today are:
– The basics of Entity-Relationship modelling

–Entities
–Relationships
–Attributes

97
How to find entities?

• Entity:
– "...anything (people, places, objects, events, etc.)
about which we store information (e.g. supplier,
machine tool, employee, utility pole, airline seat,
etc.).”
– Tangible: customer, product
– Intangible: order, accounting receivable
– Look for singular nouns (beginner)
– BUT a proper noun is not a good candidate….

98
Entities
• Entity - distinguishable “thing” in the real world
– Strong (or regular) entity - entities have an
independent existence (e.g. staff)
– Weak entity - existence dependent on some other
entity (e.g. next of kin)

E n tity Na m e Entity type name


(singular, no spaces,
capital letter at start of each word)

space for attributes


99
Entity
Instance
Entity instance: a single occurrence of an entity.
– 6 instances

Entity: student Student Last First


ID Name Name
2144 Arnold Betty
3122 Taylor John
3843 Simmons Lisa
instance
9844 Macy Bill
2837 Leath Heather
2293 Wrench Tim
100
How to find attributes?

• Attribute:
– Attributes are data objects that either identify or
describe entities (property of an entity).
– In other words, it is a descriptor whose values are
associated with individual entities of a specific entity
type
• The process for identifying attributes is similar except now
you want to look for and extract those names that appear
to be descriptive noun phrases.

101
Attributes
“Describe detail information about an entity ”
• Entity: Employee
• Attributes:
– Employee-Name
– Address (composite)
– Phone Extension
– Date-Of-Hire
– Job-Skill-Code
– Salary

102
Classes of attributes
• Simple attribute
• Composite attribute
• Derived attributes
• Single-valued attribute
• Multi-valued attribute

103
Simple/Composite attribute
• A simple attribute cannot be subdivided.
– Examples: Age, Gender, and Marital status
• A composite attribute can be further
subdivided to yield additional attributes.
– Examples:
• ADDRESS -- Street, City, State, Zip
• PHONE NUMBER -- Area code, Exchange number

104
Derived attribute
• is not physically stored within the database
• instead, it is derived by using an algorithm.
– Example 1: Late Charge of 2%
• MS Access: InvoiceAmt * 0.02
– Example 2: AGE can be derived from the date of
birth and the current date.
• MS Access: int(Date() – Emp_Dob)/365)

105
Single-valued attribute
• can have only a single (atomic) value.
– Examples:
• A person can have only one social security number.
• A manufactured part can have only one serial number.
– A single-valued attribute is not necessarily a
simple attribute.
• Part No: CA-08-02-189935
• Location: CA, Factory#:08, shift#: 02, part#: 189935

106
Multi-valued attributes
• can have many values.
– Examples:
• A person may have several college degrees.
• A household may have several phones with
different numbers
• A car color

107
Example - “Movie Database”

• Entity:
– Movie Star
• Attributes:
– SS#: “123-45-6789” (single-valued)
– Cell Phone: “(661)123-4567, (661)234-5678”
(multi-valued)
– Name: “Harrison Ford” (composite)
– Address: “123 Main Str., LA, CA” (composite)
– Gender: “Female” (simple)
– Age: 24 (derived)

108
How to find relationships?

• Relationship:
– Relationships are associations between entities.
– Typically, a relationship is indicated by a verb
connecting two or more entities.
– Employees are assigned to projects
– Relationships should be classified in terms of
cardinality.
• One-to-one, one-to-many, etc.

109
Relationships: constraints
• The degree of a relationship type
– binary (connects 2 entity types)
– unary/ recursive (connects 1 entity type with itself) Degree
– complex (connects 3 or more entity types)
• Ternary (connects 3)

• Relationship constraints - cardinality


– one to one (1:1)
– one to many (1:m)
– many to many (m:n)
• Relationship constraints – participation Multiplicity
– full/mandatory
– or partial/optional

110
Relationships: Degree
Entity1 Entity2
HasLinkWith
Binary relationship

S u p e rv is o r Supe rv ise s

Entity1
Staff Recursive (Unary) relationship
- example
S u p e rv is e e

Entity1 Entity3
Te rnaryRe lationship

Complex relationship

here ternary Entity2

111
How to find cardinalities?

• Cardinality:
– The cardinality is the number of occurrences in one
entity which are associated to the number of
occurrences in another.
– There are three basic cardinalities (degrees of
relationship).
– one-to-one (1:1), one-to-many (1:M), and many-to-
many (M:N)

112
Basic Cardinality Type
• 1-to-1 relationship

• 1-to-M relationship

• M-to-N relationship
Cardinality con’t
Identifier
“attributes that uniquely identify entity instances”
• Becomes a PK in RDS
• Composite identifiers are identifiers that consist
of two or more attributes
• Identifiers are represented by underlying the
name of the attribute(s)
– Employee (Employee_ID), student (Student_ID)

115
Crow’s Foot Notation

• Known as IE notation (most popular)


• Entity:
– Represented by a rectangle, with its name on the
top. The name is singular (entity) rather than plural
(entities).

116
Attributes

• Identifiers are represented by underlying the


name of the attribute(s)

117
Entity Relationship diagram
(ERD)

Shows in a diagram how entities
(tables) are related to one another.

 One to One
 One to many
 Many to many
One to one

Extension of number of attributes in a
single table

Rarely required

Tree More tree


attributes
One to Many

Most
common Parent Child


Requires
two Family Genus Species

tables.

Linked by
Foreign
Key
Many to many

Need to break
down to one to Measurement Code

many


Requires three Tree Code

tables

Measurement

Associative table
provides common
key
E-R Diagram: Crow’s Foot Model
• Entity
– represented by a rectangle with
its name in capital letters.

• Relationships
– represented by an active or
passive verb that connects the
related entities.

• Connectivities
– indicated by symbols next to
entities.
• 2 vertical lines for 1
• “crow’s foot” for M

Database Systems: Design, Implementation, & Management: Rob & Coronel

S511 Session 2, IU-SLIS 12


2
Example Model

123
E-R Diagram: Chen’s Model
• Peter Chen’s Landmark Paper in 1976
– “The Relationship Model: Toward a Unified View of Data”
– Graphical representation of entities and their relationships

• Entity Relationship (ER) Model

– Based on Entity, Attributes & Relationships


• Entity is a thing about which data are to be collected and stored
– e.g. EMPLOYEE
• Attributes are characteristics of the entity
– e.g. SSN, last name, first name
• Relationships describe an associations between entities
– i.e. 1:M, M:N, 1:1

– Complements the relational data model concepts


• Helps to visualize structure and content of data groups
– entity is mapped to a relational table
• Tool for conceptual data modeling (higher level representation)

– Represented in an Entity Relationship Diagram (ERD)


• Formalizes a way to describe relationships between groups of data
S511 Session 2, IU-SLIS 124
Notation
E-R Diagram: Chen Model
• Entity
– represented by a rectangle with its
name in capital letters.

• Relationships
– represented by an active or passive
verb inside the diamond that connects
the related entities.

• Connectivities
– i.e., types of relationship
– written next to each entity box.

Database Systems: Design, Implementation, & Management: Rob & Coronel

S511 Session 2, IU-SLIS 12


6
Data Model by Peter Chen’
Notation (first - original)
E-R Model: Pros & Cons
• Advantages

– Exceptional conceptual simplicity


• easily viewed and understood representation of database
• facilitates database design and management

– Integration with the relational database model


• enables better database design via conceptual modeling

• Disadvantages

– Incomplete model on its own


• Limited representational power
– cannot model data constraints not tied to entity relationships
» e.g. attribute constraints
– cannot represent relationships between attributes within entities
• No data manipulation language (e.g. SQL)

– Loss of information content


• Hard to include attributes in ERD
S511 Session 2, IU-SLIS 128
Object-Oriented Database
• Semantic Data Model (SDM)
– Modeled both data and their relationships in a single structure (object)
• Developed by Hammer & McLeod in 1981

• Object-oriented concepts became popular in 1990s


– Modularity facilitated program reuse and construction of complex structures
– Ability to handle complex data types (e.g. multimedia data)

• Object-Oriented Database Model (OODBM)


– Maintains the advantages of the ER model but adds more features
– Object = entity + relationships (between & within entity)
• consists of attributes & methods
– attributes describe properties of an object
– methods are all relevant operations that can be performed on an object
• self-contained abstraction of real-world entity
– Class = collection of similar objects with shared attributes and methods
• e.g. EMPLOYEE class = (employ1 object, employ2 object, …)
• organized in a class hierarchy
– e.g. PERSON > EMPLOYEE, CUSTOMER
– Incorporates the notion of inheritance
• attributes and methods of a class are inherited by its descendent classes

S511 Session 2, IU-SLIS 129


OO Database Model vs. E-R
Model
OODBM:
- can accommodate relationships within a object
- objects to be used as building blocks for autonomous
structures

Database Systems: Design, Implementation, & Management: Rob & Coronel


S511 Session 2, IU-SLIS 130
Object-Oriented Database: Pros & Cons
• Advantages
– Semantic representation of data
• fuller and more meaningful description of data via object
– Modularity, reusability, inheritance
– Ability to handle
• complex data
• sophisticated information requirements

• Disadvantages
– Lack of standards
• no standard data access method
– Complex navigational data access
• class hierarchy traversal
– Steep learning curve
• difficult to design and implement properly
– More system-oriented than user-centered
– High system overhead
• slow transactions
S511 Session 2, IU-SLIS 131
NORMALIZATION
Goals of Normalisation
• Eliminate certain kinds of redundancy
• avoid certain update anomalies
• good reresentation of real world
• simplify enforcement of DB integrity
Update anomalies
• Undesirable side-effects that occur
when performaing insertion,
modification or deletion operations on
badly designed relational DBs.
SSN Name Dept DeptMgr Dept Name
987 J Smith 1 321 Representing

654 M Burke 2 467 Department info
...
333 A Dolan 1 321 in the Employee
321 K Doyle 1 321 table causes
678 O O’Neill 3 678 problems.
467 R McKay 2 467
Sample anomalies
• Modification -
– when the manager of a dept changes we
have to change many values.
– If we are not careful the DB will contain
inconsistencies.
– There is no easy way to get the DB to ensure
that a department has only one manager and
only one name.
Anomalies continued
• Deletion -
– if O O’Neill leaves we delete his tuple and
lose
• the fact that there is a department 3
• the name of dept 3
• who is the manager of dept. 3
• Insertion
– how would we create a new department
before any employees are assigned to it ?
Better design
• Separate entities are represented in
separate tables.
SSN Name Dept Dept DeptMgr Dept Name
987 J Smith 1 1 321 …
654 M Burke 2 2 467
...
333 A Dolan 1 3 678
321 K Doyle 1
678 O O’Neill 3
467 R McKay 2

Note that mapping from an ER model following the steps given


will give a well-normalised DB.
Definition
• This is the process which allows you to winnow out
redundant data within your database.
• This involves restructuring the tables to
successively meeting higher forms of
Normalization.
• A properly normalized database should have the
following characteristics
– Scalar values in each fields
– Absence of redundancy.
– Minimal use of null values.
– Minimal loss of information.
Levels of Normalization
• Levels of normalization based on the amount
of redundancy in the database.
• Various levels of normalization are:
– First Normal Form (1NF)
– Second Normal Form (2NF)

Number of Tables
Redundancy
– Third Normal Form (3NF)

Complexity
– Boyce-Codd Normal Form (BCNF)
– Fourth Normal Form (4NF)
– Fifth Normal Form (5NF)
– Domain Key Normal Form (DKNF)

Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFor
orBCNF
BCNFin inorder
orderto
toavoid
avoid
the
thedatabase
databaseanomalies.
anomalies.
Levels of Normalization
1NF
2NF
3NF
4NF
5NF
DKNF

Each
Eachhigher
higherlevel
levelisisaasubset
subsetof
ofthe
thelower
lowerlevel
level
First Normal Form
First Normal Form
We say a relation is in 1NF if all values stored in the
relation are single-valued and atomic.

1NF places restrictions on the structure of relations.


Values must be simple.

141
91.2914
First Normal Form
The following is not in 1NF

EmpNum EmpPhone EmpDegrees


123 233-9876
333 233-1231 BA, BSc, PhD
679 233-1231 BSc, MSc

EmpDegrees is a multi-valued field:


employee 679 has two degrees: BSc and MSc
employee 333 has three degrees: BA, BSc, PhD

142
91.2914
First Normal Form
EmpNum EmpPhone EmpDegrees
123 233-9876
333 233-1231 BA, BSc, PhD
679 233-1231 BSc, MSc

To obtain 1NF relations we must, without loss of


information, replace the above with two relations -
see next slide

143
91.2914
First Normal Form
EmployeeDegree
Employee
EmpNum EmpDegree
EmpNum EmpPhone
333 BA
123 233-9876
333 BSc
333 233-1231
333 PhD
679 233-1231
679 BSc
679 MSc

144
91.2914
First Normal Form (1NF)
This table is Not in 1NF, Explain why and then

ISBN Title AuName AuPhone PubName PubPhone Price

0-321-32132-1 Balloon Sleepy, 321-321-1111, Small House 714-000-0000 $34.00


Snoopy, 232-234-1234,
Grumpy 665-235-6532

0-55-123456-9 Main Street Jones, 123-333-3333, Small House 714-000-0000 $22.95


Smith 654-223-3455
0-123-45678-0 Ulysses Joyce 666-666-6666 Alpha Press 999-999-9999 $34.00

1-22-233700-0 Visual Roman 444-444-4444 Big House 123-456-7890 $25.00


Basic
Author
Authorand
andAuPhone
AuPhonecolumns
columnsare
arenot
notscalar
scalar

1. Place all items that appear in the repeating group


in a new table
2. Designate a primary key for each new table
produced.
3. Duplicate in the new table the primary key of the
table from which the repeating group was
extracted or vice versa.
ISBN AuName AuPhone
Example (1NF) 0-321-32132-1 Sleepy 321-321-1111

ISBN Title PubName PubPhone Price 0-321-32132-1 Snoopy 232-234-1234

0-321-32132-1 Balloon Small House 714-000-0000 $34.00 0-321-32132-1 Grumpy 665-235-6532

0-55-123456-9 Main Street Small House 714-000-0000 $22.95 0-55-123456-9 Jones 123-333-3333

0-123-45678-0 Ulysses Alpha Press 999-999-9999 $34.00 0-55-123456-9 Smith 654-223-3455

1-22-233700-0 Visual Big House 123-456-7890 $25.00 0-123-45678-0 Joyce 666-666-6666


Basic
1-22-233700-0 Roman 444-444-4444
Second Normal Form
Second Normal Form
A relation is in 2NF if it is in 1NF, and every non-key
attribute is fully dependent on each candidate key. (That is, we
don’t have any partial functional dependency.)

• 2NF (and 3NF) both involve the concepts of key and


non-key attributes.
• A key attribute is any attribute that is part of a key;
any attribute that is not a key attribute, is a non-key attribute.
• Relations that are not in BCNF have data redundancies
• A relation in 2NF will not have any partial dependencies

147
91.2914
Functional Dependencies
Functional Dependencies
We say an attribute, B, has a functional dependency on
another attribute, A, if for any two records, which have
the same value for A, then the values for B in these two
records must be the same. We illustrate this as:
AB
Example: Suppose we keep track of employee email
addresses, and we only track one email address for each
employee. Suppose each employee is identified by their
unique employee number. We say there is a functional
dependency of email address on employee number:

employee number  email address


148
91.2914
Functional Dependencies
EmpNum EmpEmail EmpFname EmpLname
123 [email protected] John Doe
456 [email protected] Peter Smith
555 [email protected] Alan Lee
633 [email protected] Peter Doe
787 [email protected] Alan Lee

If EmpNum is the PK then the FDs:


EmpNum  EmpEmail
EmpNum  EmpFname
EmpNum  EmpLname
must exist.

149
91.2914
Functional Dependencies
EmpNum  EmpEmail
EmpNum  EmpFname 3 different ways
EmpNum  EmpLname you might see FDs
depicted
EmpEmail
EmpNum EmpFname

EmpLname

EmpNum EmpEmail EmpFname EmpLname

150
91.2914
Partial dependency
A partial dependency exists when an attribute B is
functionally dependent on an attribute A, and A is a
component of a multipart candidate key.

InvNum LineNum Qty InvDate

Candidate keys: {InvNum, LineNum} InvDate is


partially dependent on {InvNum, LineNum} as
InvNum is a determinant of InvDate and InvNum is
part of a candidate key
151
91.2914
Second Normal Form (2NF)
For a table to be in 2NF, there are two requirements
– The database is in first normal form
– All nonkey attributes in the table must be functionally
dependent on the entire primary key
Note: Remember that we are dealing with non-key attributes

Example 1 (Not 2NF)


Scheme  {Title, PubId, AuId, Price, AuAddress}
1. Key  {Title, PubId, AuId}
2. {Title, PubId, AuID}  {Price}
3. {AuID}  {AuAddress}
4. AuAddress does not belong to a key
5. AuAddress functionally depends on AuId which is a
subset of a key
2NF - Decomposition
1. If a data item is fully functionally dependent on only a
part of the primary key, move that data item and that
part of the primary key to a new table.
2. If other data items are functionally dependent on the
same part of the key, place them in the new table also
3. Make the partial primary key copied from the original
table the primary key for the new table. Place all items
that appear in the repeating group in a new table
Example 1 (Convert to 2NF)
Old Scheme  {Title, PubId, AuId, Price, AuAddress}
New Scheme  {Title, PubId, AuId, Price}
New Scheme  {AuId, AuAddress}
Second Normal Form (2NF)
Example 2 (Not 2NF)
Scheme  {City, Street, HouseNumber, HouseColor,
CityPopulation}
1. key  {City, Street, HouseNumber}
2. {City, Street, HouseNumber}  {HouseColor}
3. {City}  {CityPopulation}
4. CityPopulation does not belong to any key.
5. CityPopulation is functionally dependent on the City which is a
proper subset of the key

Example 3 (Not 2NF)


Scheme  {studio, movie, budget, studio_city}
1. Key  {studio, movie}
2. {studio, movie}  {budget}
3. {studio}  {studio_city}
4. studio_city is not a part of a key
5. studio_city functionally depends on studio which is a proper
subset of the key
2NF - Decomposition
Example 2 (Convert to 2NF)
Old Scheme  {Studio, Movie, Budget, StudioCity}
New Scheme  {Movie, Studio, Budget}
New Scheme  {Studio, City}

Example 3 (Convert to 2NF)


Old Scheme  {City, Street, HouseNumber, HouseColor,
CityPopulation}
New Scheme  {City, Street, HouseNumber, HouseColor}
New Scheme  {City, CityPopulation}
Third Normal Form
Third Normal Form
• A relation is in 3NF if the relation is in 1NF and all
determinants of non-key attributes are candidate keys
That is, for any functional dependency: X  Y, where Y is
a non-key attribute (or a set of non-key attributes), X is a
candidate key.
• A relation in 3NF will not have any transitive dependencies
of non-key attribute on a candidate key through another
non-key attribute.

156
91.2914
Third Normal Form (3NF)
This form dictates that all non-key attributes
of a table must be functionally dependent
on a candidate key i.e. there can be no
interdependencies among non-key
attributes.

For a table to be in 3NF, there are two


requirements
– The table should be second normal form
– No attribute is transitively dependent on
the primary key
Transitive dependency
Transitive dependency

Consider attributes A, B, and C, and where


A  B and B  C.
Functional dependencies are transitive, which
means that we also have the functional dependency
AC
We say that C is transitively dependent on A
through B.

158
91.2914
Third Normal Form (3NF)
Example (Not in 3NF)
Scheme  {Title, PubID, PageCount, Price }
1. Key  {Title, PubId}
2. {Title, PubId}  {PageCount}
3. {PageCount}  {Price}
4. Both Price and PageCount depend on a key hence 2NF
5. Transitively {Title, PubID}  {Price} hence not in 3NF
3NF - Decomposition
1. Move all items involved in transitive dependencies to a
new entity.
2. Identify a primary key for the new entity.
3. Place the primary key for the new entity as a foreign
key on the original entity.

Example 1 (Convert to 3NF)


Old Scheme  {Title, PubID, PageCount, Price }
New Scheme  {PubID, PageCount, Price}
New Scheme  {Title, PubID, PageCount}
3NF - Decomposition
Example 2 (Convert to 3NF)
Old Scheme  {Studio, StudioCity, CityTemp}
New Scheme  {Studio, StudioCity}
New Scheme  {StudioCity, CityTemp}

Contractor Contractor
Example 3 (Convert to 3NF) BuildingID Fee

100 Randolph Randolph 1200


Old Scheme  {BuildingID, Contractor, Fee}
150 Ingersoll Ingersoll 1100
New Scheme  {BuildingID, Contractor}200 Randolph Pitkin 1100

New Scheme  {Contractor, Fee} 250 Pitkin


300 Randolph
RELATIONAL ALGEBRA
Relational Algebra Overview

• Relational Algebra consists of several groups of operations


– Unary Relational Operations
• SELECT (symbol:  (sigma))
• PROJECT (symbol: (pi))
• RENAME (symbol:  (rho))
– Relational Algebra Operations From Set Theory
• UNION (  ), INTERSECTION ( ), DIFFERENCE (or MINUS, – )
• CARTESIAN PRODUCT ( x )
– Binary Relational Operations
• JOIN (several variations of JOIN exist)
• DIVISION
– Additional Relational Operations
• OUTER JOINS, OUTER UNION
• AGGREGATE FUNCTIONS (These compute summary of
information: for example, SUM, COUNT, AVG, MIN, MAX)

Slide 6- 163
Select operation

• Unary operation
• Select Operation: Uses some criteria to
select some tuples from the original
relation eg  UNIT = 5 (COURSES)
Project operation

• Unary operation
• Project Operation: Creates relation
in which each tuple has fewer
attributes eg No,(COURSES)
Join operation

• Binary operation
• Join Operation: Takes two relation and combine
them based on common attribute eg COURSES
• TAUGHT-BY
Union operation

• Binary operation
• Union Operation: Creates new relation in which
each tuple is either in the first relation, the second
relation or in both eg CIS15-Roster CIS52-Roster
Intersection operation

• Binary operation
• Intersection Operation: Creates new relation
in which each tuple is either in both
relations.
• Eg CIS15-Roster  CIS52-Roster
Difference operation

• Binary Operation
• Difference Operation: Creates new relation where
the new tuples are in the first relation but not in
the second. Eg CI15-Roster  CIS52-Roster
STRUCTURED
QUERY
LANGUAGE
SQL

• SQL is the standard language used for


relational databases.
• It is declarative language where users
declare what they want without having to
write a step by stem procedure.
• It was first implemented by Oracle
Corporation
1. Insert
• SQL Insert Operation format

insert into RELATION-NAME


values (…, …, …)
Insert (Example)

insert into COURSES


values (“CIS52”, “TCP/IP Protocols”, 6)
2. Delete
• SQL Delete Operation format

delete from RELATION-NAME


where criteria
Delete (Example)

Delete from COURSES


where No = “CIS19”
3. Update
• SQL Update Operation format

update RELATION-NAME
set attribute1 = value1 attribute 2 = value2 …
where criteria
Update (Example)

update COURSES
set unit = 6
where No = “CIS51”
4. Select
• SQL Select Operation format

select *
from RELATION-NAME
where criteria
Select (Example)

select *
from COURSES
where Unit = 5
5. Project
• SQL Project Operation format

select attribute-list
from RELATION-NAME
Project (Example)

select No, Unit


from COURSES
6. Join
• SQL Join Operation format

select attribute-list
from RELATION NO1, RELATION NO2
where criteria
Join (Example)

select No, Course-Name, Unit, Professor


from COURSES, TAUGHT-BY
where COURSES.No = TAUGHT-BY.No;
7. Union
• SQL Union Operation format

select *
from RELATION NO1
union
select *
from RELATION NO2
Union (Example)

select *
from CIS15-Roster
union
select *
from CIS52-Roster;
8. Intersection
• SQL Intersection Operation format

select *
from RELATION NO1
intersection
select *
from RELATION NO2
Intersection (Example)

select *
from CIS15-Roster
intersection
select *
from CIS52-Roster;
9. Difference
• SQL Difference Operation format

select *
from RELATION NO1
minus
select *
from RELATION NO2
Intersection (Example)

select *
from CIS15-Roster
minus
select *
from CIS52-Roster;

You might also like