Database Management Systems
Chapter 1
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
1
Basic Concepts
Data
Known facts that could be recorded and stored on
computer media
Example: For a student, we have name, student
number, program, address, and so on.
Data vs. Information
These two are closely related and are often used
interchangeably. However, it is often useful to
distinguish between data and information.
We define information as data that has been
processed in such a way that it can increase the
knowledge of the person who uses it.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
2
Basic Concepts
Metadata
Data that describe the properties or characteristics
of other data
Data definition, data structures, rules and
constraints are some of the properties included in
metadata.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3
3
Why Databases?
information superhighway
data can become a liability
cost of managing and acquiring >> value derived
from it
what do we need?
tools that simplify the task of managing data and
extracting useful information in a timely fashion
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4
4
Database
What is a Database?
A collection of data that is used by the application
systems of some given enterprise.
A database is a collection of files, which in turn are
collections of records.
Records consist of information that is related in
one way or another.
The pieces of information that comprise a record
are called fields.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5
5
Database
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6
6
DBMS
Database Management System
Database system is basically a computerized
record-keeping system
The information concerned can be anything that is
deemed to be of significance to the individual or
organization the system is intended to serve
A database system involves four major
components, namely data, hardware, software,
and users.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7
7
What Is a DBMS?
Models real-world enterprise.
Entities (e.g., students, courses)
Relationships (e.g., Madonna is taking CS564)
A Database Management System (DBMS) is a
software package designed to store and
manage databases.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8
8
DBMS
a microcosm of computer science?
programming languages
object-orientation
operating systems
concurrent programming
data structures
algorithms
parallel and distributed systems
AI and expert systems
statistical techniques, etc.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9
9
History
Integrated Data Store (early 1960s)
Charles Bachman (network data model)
Bachman became the first recipient of ACM’s
Turing Award (1973) in his work in the database
area
Information Management System (IMS, late
1960s)
developed by IBM (hierarchical data model)
SABRE system
• airline reservation system (American Airlines)
• [Link]
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10
10
History
Relational Model (1970)
Edgar Codd (IBM)
Codd won the 1981 Turing Award for his work
Database systems matured as an academic
discipline, and changed the commercial landscape.
SQL (1980)
the relational model is the dominant DBMS
paradigm
the Structured Query Language for relational
databases became the standard query language
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11
11
History
Concurrent Transactions (1980s)
concurrent execution of database programs
users write them to be run by themselves, the
DBMS is responsible for running them
concurrently
James Gray won the 1999 Turing Award for his
work on transaction management
Data Warehouses (1990s)
more powerful query languages, richer data
models, etc.
Internet
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12
12
Paper vs DBMS
Paper-Based System versus Database System
Advantages of database system:
compactness
speed
less drudgery
currency
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13
13
Files vs. DBMS
Scenario:
A company has a large collection (500 GB) of data
on employees, departments, products, sales and so
on.
data is accessed concurrently
questions about the data must be answered
quickly
changes made to the data by different users must
be applied consistently
access to certain parts of the data (e.g., salaries)
must be restricted.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14
14
Files vs. DBMS
Application must store large datasets
between main memory and secondary
storage (e.g., buffering, page-oriented access,
32-bit addressing, etc.)
Special code for different queries
Must protect data from inconsistency due to
multiple concurrent users
Crash recovery (restore data)
Security and access control
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15
15
Why Use a DBMS?
Data independence
Efficient access.
Data integrity and security.
Uniform data administration.
Concurrent access, recovery from crashes.
Reduced application development time.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16
16
Database Applications
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17
17
?
Why Study Databases??
Shift from computation to information
Internet technologies
scientific applications
Datasets increasing in diversity and volume.
Digital libraries, interactive video, Human
Genome project, EOS project
... need for DBMS exploding
DBMS encompasses most of CS
OS, languages, theory, AI, multimedia, logic
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18
18
Data Models
A data model is a collection of concepts for
describing data.
A schema is a description of a particular
collection of data, using a given data model.
The relational model of data is the most widely
used model today.
Main concept: relation, basically a table with rows
and columns.
Every relation has a schema, which describes the
columns, or fields.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19
19
Levels of Abstraction
Many views, single View 1 View 2 View 3
conceptual (logical) schema
and physical schema.
Conceptual Schema
Views describe how users
see the data.
Physical Schema
Conceptual schema defines
logical structure
Physical schema describes
the files and indexes used.
Schemas are defined using DDL; data is modified/queried using DML.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 20
20
Example: University Database
Conceptual schema:
Students(sid: string, name: string, login: string,
age: integer, gpa:real)
Courses(cid: string, cname: string, credits: integer)
Faculty(fid: string, fname: string, sal: real)
Physical schema:
Relations stored as unordered files.
Index on first column of Students.
External Schema (View):
Course_info(cid: string, fname: string, enrollment:
integer)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 21
21
Data Independence *
Applications insulated from how data is
structured and stored.
Logical data independence: Protection from
changes in logical structure of data
(conceptual schema).
Physical data independence: Protection from
changes in physical structure of data (physical
schema).
One of the most important benefits of using a DBMS!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 22
22
Concurrency
Concurrent execution of user programs
is essential for good DBMS performance.
Because disk accesses are frequent, and relatively
slow, it is important to keep the cpu humming by
working on several user programs concurrently.
Interleaving actions of different user programs
can lead to inconsistency: e.g., check is cleared
while account balance is being computed.
DBMS ensures such problems don’t arise: users
can pretend they are using a single-user system.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 23
23
Transaction: An Execution of a DB Program
Key concept is transaction, which is an atomic
sequence of database actions (reads/writes).
Each transaction, executed completely, must
leave the DB in a consistent state if DB is
consistent when the transaction began.
Users can specify some simple integrity constraints on
the data, and the DBMS will enforce these constraints.
Thus, ensuring that a transaction (run alone) preserves
consistency is ultimately the user’s responsibility!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 24
24
Scheduling Concurrent Transactions
DBMS ensures that execution of {T1, ... , Tn} is
equivalent to some serial execution T1’ ... Tn’.
Before reading/writing an object, a transaction requests
a lock on the object, and waits till the DBMS gives it the
lock. All locks are released at the end of the transaction.
(Strict 2PL locking protocol.)
Idea: If an action of Ti (say, writing X) affects Tj (which
perhaps reads X), one of them, say Ti, will obtain the
lock on X first and Tj is forced to wait until Ti completes;
this effectively orders the transactions
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 25
25
Scheduling Concurrent Transactions
Locks
Write-lock (exclusive lock)
only one transaction at a
time
Read-lock (shared lock) can
be held by more than 1
transaction at a time
Deadlocks
one of T3 or T4 must be
rolled back and its lock
released
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 26
26
Ensuring Atomicity
DBMS ensures atomicity (all-or-nothing property)
even if system crashes in the middle of a write
action.
Idea: Keep a log (history) of all actions carried out
by the DBMS before executing a write action:
Before a change is made to the database, the
corresponding log entry is forced to a safe location
(disk).
(WAL protocol; OS support for this is often inadequate.)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 27
27
Ensuring Atomicity
After a crash, the effects of partially executed
transactions are undone using the log.
If log entry wasn’t saved before the crash, DBMS would
be unable to detect and undo the change.
The time required to recover from a crash can be
reduced by periodically forcing some information to
disk; this periodic operation is called a checkpoint.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 28
28
The Log
The following actions are recorded in the log:
Ti writes an object: the old value and the new value.
• Log record must go to disk before the changed page!
Ti commits/aborts: a log record indicating this action.
Log records chained together by write action id, so it’s
easy to undo a specific write action (e.g., to resolve a
deadlock).
Log is often duplexed and archived on “stable” storage.
All log related activities (and in fact, all CC related
activities such as lock/unlock, dealing with deadlocks
etc.) are handled transparently by the DBMS.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 29
29
These layers
Structure of a DBMS must consider
concurrency
control and
recovery
A typical DBMS has a Query Optimization
layered architecture. and Execution
The figure does not Relational Operators
show the concurrency
Files and Access Methods
control and recovery
components. Buffer Management
This is one of several
Disk Space Management
possible architectures;
each system has its own
variations. DB
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 30
30
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 31
31
Database Roles
Data Owner
the person who makes the strategic and policy
decisions regarding the data of the
enterprise.
Database Administrators
Database Application Programmers
End Users and DBMS Vendors
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 32
32
Database Roles
Database Administrator (DBA) is the
person who provides the necessary technical
support for implementing those decisions.
Thus, the DBA is responsible for the overall
control of the system at a technical level.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 33
33
Database Roles
Here are functions of the DBA:
Defining the physical/conceptual schema
Handling of security, authentication and integrity
rules
Liaising with users
Defining backup and recovery procedures for data
availability
Monitoring performance and responding to
changing requirements
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 34
34
Databases make these folks happy ...
End users and DBMS vendors
DB application programmers
E.g. smart webmasters
Database administrator (DBA)
Designs logical /physical schemas
Handles security and authorization
Data availability, crash recovery
Database tuning as needs evolve
Must understand how a DBMS works!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 35
35
Summary
DBMS used to maintain, query large datasets.
Benefits include recovery from system crashes,
concurrent access, quick application
development, data integrity and security.
Levels of abstraction give data independence.
A DBMS typically has a layered architecture.
DBAs hold responsible jobs
and are well-paid!
DBMS R&D is one of the broadest,
most exciting areas in CS.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 36
36