Part 1: Introduction
Database System Concepts, 7th Ed.
©Silberschatz, Korth and Sudarshan
www.db-book.com
See www.db-book.com for conditions on re-use
Turing Award Winners in Database
Based on Database System Concepts - 7th Edition 1.2
Charles Bachman 1973
"For his outstanding contributions to database
technology"
Integrated Data Store (IDS)
▪ The first database management system, operational
by 1963
▪ Maintains a single set of shared files on disk, together
with the tools to structure and maintain them
▪ Provides application programmers with a set of
powerful commands to manipulate data (Data
Manipulation Language)
▪ The files could be restructured, moved or expanded
without rewriting all the programs that accessed them
(Data Independence)
▪ Maintains a separate data dictionary, tracking
information on the different kinds of records in the
system and their relationships
Based on Database System Concepts - 7th Edition 1.3
Ted Codd 1981
"For his fundamental and continuing contributions
to the theory and practice of database management
systems"
Relational Model
▪ The relational model is widely recognized as one of
the great technical achievements of the 20th century
▪ Essentially all databases in use or under
development today are based on Codd’s ideas
• Whenever anyone uses an ATM machine, or
purchases an airline ticket, or uses a credit card,
he or she is effectively relying on Codd’s
invention
▪ The relational model was the very first abstract
database model to be defined
• Codd not only invented the relational model in
particular, he actually invented the data model
concept in general
Based on Database System Concepts - 7th Edition 1.4
Jim Gray 1998
"For seminal contributions to database and
transaction processing research and technical
leadership in system implementation"
Transaction Processing
▪ Plays a major role in IBM’s relational System R,
creating a unified approach to the problems of
concurrency control and crash recovery
▪ Defines the transaction as a unit of work that must
leave the database in a consistent state whether or
not the transaction succeeds
▪ Develops techniques that allowed concurrent
execution of many transactions, as well as restart
after crashes, while maintaining the consistency of
the database
Based on Database System Concepts - 7th Edition 1.5
Michael Stonebraker 2014
"For fundamental contributions to the concepts
and practices underlying modern database
systems"
Relational Technology
▪ Contributes to the refinement and spread of
database management technology
▪ After reading Codd’s seminal papers on the
relational model, Stonebraker started work with a
colleague, Eugene Wong, to develop an efficient
and practical implementation
▪ The result was INGRES (Interactive Graphic and
Retrieval System), and a prototype of INGRES was
working by 1974
▪ INGRES and System R together helped to turn
relational systems from a laboratory curiosity into
the default choice for even the most demanding
data processing applications
Based on Database System Concepts - 7th Edition 1.6
Structured and Unstructured Textual
Information
▪ A Database Management System (DBMS) is a complex software system
whose task is to manage a large, complex collection of primarily structured
information
▪ Structured Information are organized in discrete units (entities)
▪ Entities of the same type are organized in some predefined ways:
• They have the same number of attributes
• Each attribute has
▪ The same predefined format, e.g. the number of bytes required to
store an attribute
▪ Unstructured information refers to data that
• Does not have a rigid format
• Does not distinguish information into specific items
• Usually free text
7
Based on Database System Concepts - 7th Edition 1.7
Database Systems
▪ The collection of data containing information relevant to an enterprise is
referred to as a database
▪ A database system contains information about a particular enterprise
• Collection of interrelated data
• Set of programs to access the data
• An environment that is both convenient and efficient to use
▪ Database systems are used to manage collections of data that are:
• Highly valuable
• Relatively large
• Accessed by multiple users and applications, often at the same
time
Based on Database System Concepts - 7th Edition 1.8
Database Applications Examples
▪ Enterprise Information
• Sales: customers, products, purchases
• Accounting: payments, receipts, assets
• Human Resources: Information about employees, salaries, payroll
taxes.
▪ Manufacturing: management of production, inventory, orders, supply
chain.
▪ Banking and finance
• customer information, accounts, loans, and banking transactions.
• Credit card transactions
• Finance: sales and purchases of financial instruments (e.g., stocks
and bonds; storing real-time market data)
▪ Universities: registration, grades
Based on Database System Concepts - 7th Edition 1.9
Purpose of Database Systems
In the early days, database applications were built directly on top of file
systems, which leads to:
▪ Data redundancy and inconsistency: data is stored in multiple file
formats resulting in duplication of information in different files
▪ Difficulty in accessing data
• Need to write a new program to carry out each new task
▪ Data isolation
• Multiple files and formats
▪ Integrity problems
• Integrity constraints (e.g., account balance > 0) become “buried”
in program code rather than being stated explicitly
• Hard to add new constraints or change existing ones
Based on Database System Concepts - 7th Edition 1.10
Purpose of Database Systems
▪ Atomicity of updates
• Failures may leave database in an inconsistent state with partial
updates carried out
• Example: Transfer of funds from one account to another should either
complete or not happen at all
▪ Concurrent access by multiple users
• Concurrent access needed for performance
• Uncontrolled concurrent accesses can lead to inconsistencies
▪ Example: Two people reading a balance (say $100) and updating
it by withdrawing money (say $50 each) at the same time
▪ Security problems
• Hard to provide user access to some, but not all, data
Database systems offer solutions to the above problems
Based on Database System Concepts - 7th Edition 1.11
University Database Example
▪ Data consists of information about:
• Students
• Instructors
• Classes
▪ Application program examples:
• Add new students, instructors, and courses
• Register students for courses, and generate class rosters
• Assign grades to students, compute grade point averages (GPA) and
generate transcripts
Based on Database System Concepts - 7th Edition 1.12
View of Data
▪ A database system is a collection of interrelated data and a set of
programs that allow users to access and modify these data
▪ A major purpose of a database system is to provide users with an abstract
view of the data
• Data models
▪ A collection of conceptual tools for describing data, data
relationships, data semantics, and consistency constraints
• Data abstraction
▪ Hide the complexity of data structures to represent data in the
database from users through several levels of data abstraction
Based on Database System Concepts - 7th Edition 1.13
Data Models
▪ A collection of tools for describing
• Data
• Data relationships
• Data semantics
• Data constraints
▪ Relational model
▪ Entity-Relationship data model (mainly for database design)
▪ Object-based data models (Object-oriented and Object-relational)
▪ Semi-structured data model (JSON, XML)
▪ Other older models:
• Network model
• Hierarchical model
Based on Database System Concepts - 7th Edition 1.14
Relational Model
▪ All the data is stored in various tables
▪ Example of tabular data in the relational model
Ted Codd
Columns
Turing Award 1981
Rows
Based on Database System Concepts - 7th Edition 1.15
A Sample Relational Database
Based on Database System Concepts - 7th Edition 1.16
Levels of Abstraction
▪ Physical level: describes how a record (e.g., instructor) is
stored
▪ Logical level: describes data stored in database, and the
relationships among the data
▪ View level: application programs hide details of data types.
Views can also hide information (such as an employee’s
salary) for security purposes
Based on Database System Concepts - 7th Edition 1.17
View of Data
An architecture for a database system
Based on Database System Concepts - 7th Edition 1.18
Instances and Schemas
▪ Logical Schema – the overall logical structure of the database
• Example: The database consists of information about a set of
customers and accounts in a bank and the relationship between them
▪ Analogous to type information of a variable in a program
▪ Physical schema – the overall physical structure of the database
▪ Instance – the actual content of the database at a particular point in time
• Analogous to the value of a variable
Based on Database System Concepts - 7th Edition 1.19
Physical Data Independence
▪ Physical Data Independence – the ability to modify the physical
schema without changing the logical schema
• Applications depend on the logical schema
• In general, the interfaces between the various levels and
components should be well defined so that changes in some parts
do not seriously influence others
Based on Database System Concepts - 7th Edition 1.20
Data Definition Language (DDL)
▪ Specification notation for defining the database schema
Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))
▪ DDL compiler generates a set of table templates stored in a data
dictionary
▪ Data dictionary contains metadata (i.e., data about data)
• Database schema
• Integrity constraints
▪ Primary key (ID uniquely identifies instructors)
• Authorization
▪ Who can access what
Based on Database System Concepts - 7th Edition 1.21
Data Manipulation Language (DML)
▪ Language for accessing and updating the data organized by the
appropriate data model
• DML also known as query language
▪ There are basically two types of data-manipulation language
• Procedural DML -- require a user to specify what data are needed
and how to get those data.
• Declarative DML -- require a user to specify what data are needed
without specifying how to get those data.
▪ Declarative DMLs are usually easier to learn and use than are procedural
DMLs
▪ Declarative DMLs are also referred to as non-procedural DMLs
▪ The portion of a DML that involves information extraction is called a
query language
Based on Database System Concepts - 7th Edition 1.22
SQL (Structured Query Language)
▪ SQL is nonprocedural. A query takes as input several tables (possibly
only one) and always returns a single table
▪ Example to find all instructors in Comp. Sci. dept
select name
from instructor
where dept_name = 'Comp. Sci.'
▪ SQL is NOT a Turing machine equivalent language
• some computations that are possible using a general-purpose
programming language but are not possible using SQL
▪ To be able to compute complex functions SQL is usually embedded in
some higher-level language
Based on Database System Concepts - 7th Edition 1.23
Database Access from Application Program
▪ Non-procedural query languages such as SQL are not as powerful as a
universal Turing machine
• Church–Turing thesis or Church–Turing conjecture is
a hypothesis about the nature of computable functions, which states
that a function on the natural numbers can be calculated by an effective
method if and only if it is computable by a Turing machine
▪ effective computability means computable by a Turing machine
• The halting problem is an undecidable problem: there is no
algorithm that correctly determines whether arbitrary programs
eventually halt when run
▪ SQL does not support actions such as input from users, output to displays,
or communication over the network
▪ Such computations and actions must be written in a host language, such
as C/C++, Java or Python, with embedded SQL queries that access the
data in the database
Based on Database System Concepts - 7th Edition 1.24
Database Design
The process of designing the general structure of the database:
▪ Logical Design – Deciding on the database schema. Database design
requires that we find a “good” collection of relation schemas.
• Business decision – What attributes should we record in the
database?
• Computer decision – What relation schemas should we have and
how should the attributes be distributed among the various relation
schemas?
▪ Physical Design – Deciding on the physical layout of the database
Based on Database System Concepts - 7th Edition 1.25
Database Components
▪ A database system is partitioned into modules that deal with each of the
responsibilities of the overall system
▪ The functional components of a database system can be divided into
• The storage manager component
• The query processor component
• The transaction management component
Based on Database System Concepts - 7th Edition 1.26
Storage Manager
▪ A program module that provides the interface between the low-level data
stored in the database and the application programs and queries
submitted to the system
▪ The storage manager is responsible to the following tasks:
• Interaction with the OS file manager
• Efficient storing, retrieving and updating of data
▪ The storage manager implements several data structures as part of the
physical system implementation:
• Data files - store the database itself
• Data dictionary - stores metadata about the structure of the database,
in particular the schema of the database
• Indices - can provide fast access to data items. A database index
provides pointers to those data items that hold a particular value
Based on Database System Concepts - 7th Edition 1.27
Query Processor
▪ The query processor components include:
• DDL interpreter -- interprets DDL statements and records the
definitions in the data dictionary
• DML compiler -- translates DML statements in a query language into
an evaluation plan consisting of low-level instructions that the query
evaluation engine understands
▪ The DML compiler performs query optimization; that is, it picks
the lowest cost evaluation plan from among the various
alternatives
• Query evaluation engine -- executes low-level instructions generated
by the DML compiler
Based on Database System Concepts - 7th Edition 1.28
Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Based on Database System Concepts - 7th Edition 1.29
Transaction Management
▪ A transaction is a collection of operations that performs a single logical
function in a database application
• The SABRE System processes nearly 20,000 transactions/second
▪ Transaction-management component ensures that the database
remains in a consistent (correct) state despite system failures (e.g.,
power failures and operating system crashes) and transaction failures
• ACID (Atomicity, Consistency, Isolation, Durability) properties
▪ Concurrency-control manager controls the interaction among the
concurrent transactions, to ensure the consistency of the database
Based on Database System Concepts - 7th Edition 1.30
Database Architecture
▪ Centralized databases
• A few cores, shared memory
▪ Client-server
• One server machine executes work on behalf of multiple client
machines
▪ Parallel databases
• Many cores, shared memory
• Shared disk
• Shared nothing
▪ Distributed databases
• Geographical distribution
• Schema/data heterogeneity
Based on Database System Concepts - 7th Edition 1.31
Database Applications
Database applications are usually partitioned into two or three parts
▪ Two-tier architecture -- the application resides at the client machine,
where it invokes database system functionality at the server machine
▪ Three-tier architecture -- the client machine acts as a front end and
does not contain any direct database calls
• The client end communicates with an application server
• The application server in turn communicates with a database
system to access data
Based on Database System Concepts - 7th Edition 1.32
Two-tier and three-tier architectures
Based on Database System Concepts - 7th Edition 1.33
Database Users
There are different types of database-system users
▪ Naive users -- unsophisticated users who interact with the system by
invoking one of the application programs that have been written
previously
▪ Application programmers -- who write programs that interacts with the
database
▪ Sophisticated users -- interact with the system without writing programs
• using a database query language or by
• using tools such as data analysis software
Based on Database System Concepts - 7th Edition 1.34
Database Administrator
A person who has central control over the system is called a database
administrator (DBA), whose functions are:
▪ Schema definition
▪ Storage structure and access-method definition
▪ Schema and physical-organization modification
▪ Granting of authorization for data access
▪ Routine maintenance
▪ Periodically backing up the database
▪ Ensuring that enough free disk space is available for normal
operations, and upgrading disk space as required
▪ Monitoring jobs running on the database and ensuring that
performance is not degraded by very expensive tasks submitted by
some users
Based on Database System Concepts - 7th Edition 1.35
Database Systems vs Information Retrieval
Systems
▪ Information Retrieval (IR) or Document Retrieval is finding documents
or objects that satisfy particular information requirements from large
repositories which can contain textual or multimedia documents
• Library Search
• Image Search
• Video Search
• Music Search
Based on Database System Concepts - 7th Edition 1.36
Sec. 6.2
Textual Retrieval is Well-Developed:
term-document count matrices
▪ Count the number of occurrences of a term in a document
(7 terms, 6 documents)
Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth
Antony 157 73 0 0 0 0
Brutus 4 157 0 1 0 0
Caesar 232 227 0 2 1 1
Calpurnia 0 10 0 0 0 0
Cleopatra 57 0 0 0 0 0
mercy 2 0 3 5 5 1
worser 2 0 1 1 1 0
37
Based on Database System Concepts - 7th Edition 1.37
Sec. 6.2.2
tf-idf Weighting
▪ The relevance of a document increases with the number of occurrences of the
search term within the document
▪ Relevance increases with the rarity of the term in the collection
• Frequent terms are less informative than rare terms (e.g. stop words such as
“the”, “be”, “to”)
• Rare terms are terms that do not occur in many documents
▪ The tf-idf weight of a term is the product of its tf weight (term frequency) and its idf
weight (inverse document frequency)
tf - idf t ,d = (1 + log10 tf t ,d ) log10 ( N / df t )
• N is the number of documents in the collection
• tft,d is the number of times term t occurs in document d
• dft is the document frequency of t
▪ the number of documents in the collection that contains term t
38
Based on Database System Concepts - 7th Edition 1.38
Sec. 6.3
tf-idf Weighting
Each document is now represented by a real-valued vector of tf-idf weights ∈ R|V|. Here,
V=7, which is the number of terms.
Note: this example is not based on the data in Slide 8.
Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth
Antony 5.25 3.18 0 0 0 0.35
Brutus 1.21 6.1 0 1 0 0
Caesar 8.59 2.54 0 1.51 0.25 0
Calpurnia 0 1.54 0 0 0 0
Cleopatra 2.85 0 0 0 0 0
mercy 1.51 0 1.9 0.12 5.25 0.88
worser 1.37 0 0.11 4.15 0.25 1.95
39
Based on Database System Concepts - 7th Edition 1.39
End of Part 1
Based on Database System Concepts - 7th Edition 1.40