Understanding Database Systems Basics
Understanding Database Systems Basics
Instructor: Isayas E.
1
Chapter 1
Introduction
2
Outline
6
7
8
9
10
11
12
13
14
Drawbacks of using file systems to store data
Normal File Systems have the following problems, but it is an advantage for the
DBMS.
•Data redundancy and inconsistency:
Same information may be duplicated in several places.
All copies may not be updated properly.
•Data isolation
Data in different files
Data in different formats
•Difficult to write new application programs
15
Multiple users
Want concurrency for faster response time.
Need protection for concurrent updates.
E.g. two customers withdrawing funds from the same account at the same
time| account has $500 in it, and they withdraw $100 and $50. The result
could be $350, $400 or $450 if no protection.
Security problems
Every user of the system should be able to access only the data they are
permitted to see.
E.g. payroll people only handle employee records, and cannot see
customer accounts; tellers only access account data and cannot see payroll
data.
Difficult to enforce this with application programs
Integrity problems
Data may be required to satisfy constraints.
E.g. no account balance below $25.00
Again, difficult to enforce or to change constraints with the file-
processing approach
** These problems and others led to the development of database management
systems. In addition to its above advantages the DBMS application provides Data16
Abstraction.
Data Abstraction: The major purpose of a database system is to provide users with an
abstract view of the system.
-The system hides certain details of how data is stored and maintained.
-The purpose of the three level of architecture is to separate the way the database is
physically represented from the way user think about it.
E.g.: Students need not see information about instructors.
There are three levels of data abstraction levels or database architecture.
A)Physical/ Internal Level
How the data are stored
Eg. Index, B-tree, hashing
Lowest level of abstraction
Complex low-level structures described in detail
B) Conceptual Level
Next highest level of abstraction
Describes what data are stored
Describes the relationship among the data
Database administrator level
B)View/External Level
Highest level
Describes part of the data base for a particular group of users
Can be many different views of a database 17
Eg. Tellers in a bank get a view of customer accounts, but not of payroll data
.
pl e : In ca se of st or in g cu st omer da ta ,
Exam
si ca l le ve l – it will co nt ai ns bl oc k of st orages
Phy
(bytes,GB,TB,etc)
18
View of Data
An architecture for a database system
23
Domain: Each attribute has values taken from a domain. For example,
the domain of Name is String and that of salary is real
24
Instances and Schemas
• The collection of information stored in the database at a particular moment is called an
instance of the database.
• The overall design of the database is called the database schema.
• Similar to types and variables in programming languages
• Database schema defines the variable declarations in tables that belong to a
particular database; the value of these variables at a moment of time is called the
instance of that database.
• Database systems have several schemas, partitioned according to the levels
of abstraction.
• Physical schema – The design of a database at physical level
• Logical Schema – The design of database at logical level
• Example: The database consists of information about a set of customers and
accounts in a bank and the relationship between them
Analogous to type information of a variable in a program
• Instance – the actual content of the database at a particular point in time
• Analogous to the value of a variable 25
Data Models
It supports:
o Definition of data structures for storage.
o Definition of rules that the stored data have to obey.
o Definition of operations on the data, such that changes to represented
things in the real world can be reported for.
A DBMS allows a user to define the data to be stored in terms of a data model.
Most database management systems today are based on the relational data model,
which we will focus on this course.
26
Data Models
• A data model provides a way to describe the design of a database at the
physical(how data stored), logical(what data stored), and view/external levels.
• Classified into four different categories:
• Relational model
• Entity-Relationship data model (mainly for database design)
• Object-based data models (Object-oriented and Object-relational)
• Semi structured data model (XML)
• Other older models:
• Network model
• Hierarchical model
27
Relational Model
• Uses a collection of tables to represent both data and the relationships among those data.
• Each table has multiple columns, and each column has a unique name
• All the data is stored in various tables.
• Tables are also known as relations
• Example of tabular data in the relational model attributes
(or columns)
tuples
(or rows)
28
Lecturer ID
• Entities have
• A general type or class, such as Lecturer or Module
Name Course • Instances of that particular type, such as
Steve Mills, Natasha Alechina are
instances of Lecturer
Tutors Student • Attributes (such as name, email address)
Module Studies
29
In an E/R Diagram, an entity is usually drawn as a box with rounded corners
In this case, Lecturer, Student, and Module are all entities
Database Design
The process of designing the general structure of the database:
• Why do we need it?
• Agree on structure of the database before deciding on a particular implementation.
• Consider issues such as:
• What entities to model
• How entities are related
• What constraints exist in the domain
• How to achieve good designs
•Logical Design – Deciding on the database schema. Database design requires that we find
a “good” collection of relation schemas.
32
1. Steps of Database Design
Why is database design important?
A well-designed database facilitates data management and becomes a
valuable information generator.
A poorly designed database is a breeding ground for uncontrolled data
redundancies.
A poorly designed database generates errors that lead to bad decisions.
Database design consists of several tasks:
requirements analysis
conceptual design
logical design
schema refinement
physical design and tuning (phase in which all six kinds of design steps
are interleaved and repeated until the design is satisfactory).
In developing a good design, one should ask – what are the important queries
33
and updates? What attributes/ relations are involved?
1) Requirement Analysis
The very first step in designing a database application is to
understand
what data is to be stored in the database
what applications must be built on top of it, and
what operations are most frequent and subject to
performance requirements.
In other words, we must find out what the users
want from the database.
The study of
the current operating environment and
how it is expected to change
Analysis of any available documentation and applications that
are expected to be replaced by the database application.
34
2) Conceptual database design
35
3) Logical Database Design
The process of construction a model of the information used in an enterprise based
on a specific data model (e.g. relational), but independent of a particular DBMS and
other physical consideration.
We will only consider relational DBMSs, and therefore, the task in the logical design
step is to convert an ER schema into a relational database schema.
4) Schema Refinement:
The fourth step in database design is to analyze the collection of relations in our
relational database schema to identify potential problems, and to refine it. schema
refinement can be guided by some well-designed and powerful theory. We discuss
the theory of normalizing relations restructuring them to ensure some desirable
properties.
Normalization process
Discover New entities
Revise attributes
36
5) Physical Database Design
The process of producing a description of the implementation of the database on
secondary storage.
Secondary storage: defines specific storage or access methods used by database:
Describes the storage structure and access methods used to achieve efficient access
to the data
Tailored to specific DBMS systems – Characteristics are function of DBMS and
operating systems.
Includes estimate of storage space
6) Tuning :
In w/ch all six kinds of design steps are interleaved & repeated until the design is satisfactory.
7) Implementation in DBMS
Two things should be given an attention during the implementation of the specific DBMS
implementation.
37
Database Languages
A database system provides a data-definition language to specify the database
schema and a data-manipulation language to express database queries and
updates.
In practice, the data-definition and data-manipulation languages are not two separate
languages; instead they simply form parts of a single database language, such as the
widely used SQL language.
A)Data Definition Language (DDL): DDL statements are used to define and modify
the database structure of your tables or schema. When you execute a DDL
statement, it takes effect immediately.
Some commands of DDL are:
CREATE - to create table (objects) in the database
ALTER - alters the structure of the database
DROP - delete table from the database
TRUNCATE - remove all records from a table, including all spaces allocated for the
records are removed
COMMENT - add comments to the data dictionary
RENAME - rename a table
38
Data Definition Language (DDL)
• Specification notation for defining the database schema
Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))
primary key (ID)
• DDL compiler generates a set of table templates
stored in a data dictionary Table: Data dictionary example
40
Data Manipulation Language (DML)
• Language for accessing and manipulating the data organized by the appropriate data
model
• A query is a statement requesting the retrieval of information.
• The portion of a DML that involves information retrieval is called a query
language.
• DML also known as query language
• The most widely used query language is SQL(Structured Query Language )
42
ADVANTAGES OF A DBMS
• Data Independence: Application programs should not, ideally, be exposed to
details of data representation and storage, The DBMS provides an abstract view of
the data that hides such details.
• Efficient Data Access: A DBMS utilizes a variety of sophisticated techniques to
store and retrieve data efficiently. This feature is especially important if the data is
stored on external storage devices.
• Data Integrity and Security: If data is always accessed through the DBMS, the
DBMS can enforce integrity constraints. For example, before inserting salary
information for an employee, the DBMS can check that the department budget is
not exceeded. Also, it can enforce access controls that govern what data is visible
to different classes of users.
• Data Administration: When several users share the data, centralizing the
administration of data can offer significant improvements. Experienced
professionals who understand the nature of the data being managed, and how
different groups of users use it, can be responsible for organizing the data
representation to minimize redundancy and for fine-tuning the storage of the43 data
to make retrieval efficient.
ADVANTAGES OF A DBMS
• Concurrent Access and Crash Recovery: A DBMS schedules concurrent
accesses to the data in such a manner that users can think of the data as being
accessed by only one user at a time. Further, the DBMS protects users from the
effects of system failures.
• Reduced Application Development Time: Clearly, the DBMS supports important
functions that are common to many applications accessing data in the DBMS. This,
in conjunction with the high-level interface to the data, facilitates quick application
development. DBMS applications are also likely to be more robust than similar
stand-alone applications because many important tasks are handled by the DBMS
(and do not have to be debugged and tested in the application).
• Reading Assignment
Disadvantages of DBMS
44
History of Database Systems
• 1950s and early 1960s:
• Data processing using magnetic tapes for storage
• Tapes provided only sequential access
• Punched cards for input
• Late 1960s and 1970s:
• Hard disks allowed direct access to data
• Network and hierarchical data models in widespread use
• Ted Codd defines the relational data model
• Would win the ACM Turing Award for this work
• IBM Research begins System R prototype
• UC Berkeley begins Ingres prototype
• High-performance (for the era) transaction processing
45
History (cont.)
• 1980s:
• Research relational prototypes evolve into commercial systems
• SQL becomes industrial standard
• Parallel and distributed database systems
• Object-oriented database systems
• 1990s:
• Large decision support and data-mining applications
• Large multi-terabyte data warehouses
• Emergence of Web commerce
• Early 2000s:
• XML and XQuery standards
• Automated database administration
• Later 2000s:
• Giant data storage systems 46