CONCEPTS IN DATABASE SYSTEMS - I
Monday, October 28, 2024 Felix Mutua, Ph.D.
Course Outline
1. Review of Basic GIS concepts
2. Data models and structure: Spatial data models, topology, format conversions,
vectorization, and rasterization.
3. Database models: Relational, Hierarchical, Network, Object oriented. Data
interchange.
4. Database management: Spatial data; Editing, manipulation, analysis, updating.
5. Data analysis: Spatial analysis; Query, reclassification, overlay, coverage rebuilding,
connectivity analysis.
6. Development and maintenance of a Geographical Information System (GIS)
database.
7. GIS database security.
8. Legal issues in GIS.
9. Applications of GIS.
10/28/2024
Outline for Today’s Lecture
3
Overview of database systems
The Relational Data Model
Database design process
ER-Modelling
In part II we will look at:
SQL
Spatialdatabases
Implementation
What Is a Relational Database Management System ?
4
Database Management System = DBMS
Relational DBMS = RDBMS
A collection of files that store the data
A big C program written by someone else that accesses and updates
those files for you
Where are RDBMS used ?
5
Backend for traditional “database” applications
Backend for large Websites
Backend for Web services
Example of a Traditional Database Application
6
Suppose we are building a system
to store the information about:
students
courses
professors
who takes what, who teaches what
Can we do it without a DBMS ?
7
Sure we can! Start by storing the data in files:
students.txt courses.txt professors.txt
Now write C or Java programs to implement specific tasks
Doing it without a DBMS...
8
Enroll “Mary Johnson” in “CSE444”:
Write a C program to do the following:
Read ‘students.txt’
Read ‘courses.txt’
Find&update the record “Mary Johnson”
Find&update the record “CSE444”
Write “students.txt”
Write “courses.txt”
Problems without an DBMS...
9
System crashes: Read ‘students.txt’
Read ‘courses.txt’
Find&update the record “Mary Johnson” CRASH !
Find&update the record “CSE444”
Write “students.txt”
Write “courses.txt”
What is the problem ?
Large data sets (say 50GB)
What is the problem ?
Simultaneous access by many users
Need locks: we know them from OS, but now data on disk; and
is there any fun to re-implement them ?
Enters a DMBS
10
“Two tier database system”
connection
(ODBC, JDBC)
Database server
(someone else’s
C program) Applications
Data files
Functionality of a DBMS
11
The programmer sees SQL, which has two components:
Data Definition Language - DDL
Data Manipulation Language - DML
query language
Behind the scenes the DBMS has:
Query optimizer
Query engine
Storage management
Transaction Management (concurrency, recovery)
Functionality of a DBMS
12
Two things to remember:
Client-server architecture
Slow, cumbersome connection
But good for the data
It is just someone else’s C program
In the beginning we may be impressed by its speed
But later we discover that it can be frustratingly slow
We can do any particular task faster outside the DBMS
But the DBMS is general and convenient
How the Programmer Sees the DBMS
13
Start with DDL to create tables:
CREATE TABLE Students (
Name CHAR(30)
SSN CHAR(9) PRIMARY KEY NOT NULL,
Category CHAR(20)
) ...
Continue with DML to populate tables:
INSERT INTO Students
VALUES(‘Charles’, ‘123456789’, ‘undergraduate’)
. . . .
How the Programmer Sees the DBMS
14
Tables:
Students: Takes:
SSN Name Category SSN CID
123-45-6789 Charles undergrad 123-45-6789 CSE444
234-56-7890 Dan grad 123-45-6789 CSE444
… … 234-56-7890 CSE142
Courses: …
CID Name Quarter
CSE444 Databases fall
CSE541 Operating systems winter
Still implemented as files, but behind the scenes can
be quite complex
“data independence” = separate logical view
from physical implementation
Transactions
15
Enroll “Mary Johnson” in “CSE444”:
BEGIN TRANSACTION;
INSERT INTO Takes
SELECT Students.SSN, Courses.CID
FROM Students, Courses
WHERE Students.name = ‘Mary Johnson’ and
Courses.name = ‘CSE444’
-- More updates here....
IF everything-went-OK
THEN COMMIT;
ELSE ROLLBACK
If system crashes, the transaction is still either committed or aborted
Transactions
16
A transaction = sequence of statements that either all succeed, or all
fail
Transactions have the ACID properties:
A = atomicity
C = consistency
I = independence
D = durability
Queries
17
Find all courses that “Mary” takes
SELECT C.name
FROM Students S, Takes T, Courses C
WHERE S.name=“Mary” and
S.ssn = T.ssn and T.cid = C.cid
What happens behind the scene ?
Query processor figures out how to answer the query efficiently.
Queries, behind the scene
18
Declarative SQL query Imperative query execution plan:
sname
SELECT C.name
FROM Students S, Takes T, Courses C
WHERE S.name=“Mary” and
S.ssn = T.ssn and T.cid = C.cid cid=cid
sid=sid
name=“Mary”
Students Takes Courses
The optimizer chooses the best execution plan for a query
Database Systems
19
The big commercial database vendors:
Oracle
IBM (with DB2) bought Informix recently
Microsoft (SQL Server)
Sybase
Some free database systems (Unix) :
Postgres
Mysql
Predator
In CSE444 we use SQL Server. You may use something else, but you are
on your own.
New Trends in Databases
20
Object-relational databases
Main memory database systems
XML XML XML !
Relational databases with XML support
Middleware between XML and relational databases
Native XML database systems
Lots of research here at UW on XML and databases
Data integration
Peer to peer, stream data management – still research
NoSQL
Big Data
Levels of Abstraction
Levels of Abstraction: Internal Level/Schema
Internal Level/Schema
The internal schema defines the physical storage structure of the database.
Facts about Internal schema:
The internal schema is the lowest level of data abstraction
It helps you to keeps information about the actual representation of the
entire database. Like the actual storage of the data on the disk in the form
of records
The internal view tells us what data is stored in the database and how
It never deals with the physical devices. Instead, internal schema views a
physical device as a collection of physical pages
Levels of Abstraction: Conceptual Schema/Level
Conceptual Schema/Level
The conceptual schema describes the Database structure of the whole database
for the community of users.
This schema hides information about the physical storage structures and focuses on
describing data types, entities, relationships, etc.
This logical level comes between the user level and physical storage view.
However, there is only single conceptual view of a single database.
Facts about Conceptual schema:
Defines all database entities, their attributes, and their relationships
Security and integrity information
In the conceptual level, the data available to a user must be contained in or
derivable from the physical level
Levels of Abstraction: External Schema/Level
24
External Schema/Level
An external schema describes the part of the database which specific user is interested in. It hides
the unrelated details of the database from the user. There may be "n" number of external views for
each database.
Each external view is defined using an external schema, which consists of definitions of various
types of external record of that specific view.
An external view is just the content of the database as it is seen by some specific particular user.
For example, a user from the sales department will see only sales related data.
Facts about external schema:
An external level is only related to the data which is viewed by specific end users.
This level includes some external schemas.
External schema level is nearest to the user
The external schema describes the segment of the database which is needed for a certain user
group and hides the remaining details from the database from the specific user group
Felix Mutua, PhD
Goal of 3 level/schema of Database
25
Every user should be able to access the same data but able to see a
customized view of the data.
The user need not to deal directly with physical database storage
detail.
The DBA should be able to change the database storage structure
without disturbing the user's views
The internal structure of the database should remain unaffected when
changes made to the physical aspects of storage.
Felix Mutua, PhD
Database Models
26
A database model shows the logical structure of a database, including the
relationships and constraints that determine how data can be stored and
accessed.
Some of the most common ones include:
Hierarchical database model
Relational model
Network model
Object-oriented database model
Entity-relationship model
Document model
Entity-attribute-value model
Star schema
The object-relational model, which combines the two that make up its name
Felix Mutua, PhD
Database Models
27
Relational model : The most common model, the relational model
sorts data into tables, also known as relations, each of which consists
of columns and rows. Each row, also called a tuple, includes data
about a specific instance of the entity in question, such as a particular
employee.
Hierarchical model: The hierarchical model organizes data into a
tree-like structure, where each record has a single parent or root.
Sibling records are sorted in a particular order. That order is used as
the physical order for storing the database.
Felix Mutua, PhD
Database Models
28
Network model: The network model builds on the hierarchical model
by allowing many-to-many relationships between linked records,
implying multiple parent records. Based on mathematical set theory,
the model is constructed with sets of related records. Each set consists
of one owner or parent record and one or more member or child
records. A record can be a member or child in multiple sets, allowing
this model to convey complex relationships.
Object-oriented database model: This model defines a database as a
collection of objects, or reusable software elements, with associated
features and methods.
Felix Mutua, PhD
Database Models
29
Object-relational model: This hybrid database model combines the
simplicity of the relational model with some of the advanced
functionality of the object-oriented database model. In essence, it
allows designers to incorporate objects into the familiar table
structure.
Entity-relationship model: This model captures the relationships
between real-world entities much like the network model, but it isn’t as
directly tied to the physical structure of the database. Instead, it’s
often used for designing a database conceptually.
Felix Mutua, PhD
Database Models
30
Object-relational model: This hybrid database model combines the
simplicity of the relational model with some of the advanced
functionality of the object-oriented database model. In essence, it
allows designers to incorporate objects into the familiar table
structure.
Entity-relationship model: This model captures the relationships
between real-world entities much like the network model, but it isn’t as
directly tied to the physical structure of the database. Instead, it’s
often used for designing a database conceptually.
Felix Mutua, PhD
Database Models
31
Inverted file model: A database built with the inverted file structure is
designed to facilitate fast full text searches. In this model, data content
is indexed as a series of keys in a lookup table, with the values
pointing to the location of the associated files.
Flat model: The flat model is the earliest, simplest data model. It
simply lists all the data in a single table, consisting of columns and
rows. In order to access or manipulate the data, the computer has to
read the entire flat file into memory, which makes this model inefficient
for all but the smallest data sets.
Felix Mutua, PhD
Database Models
32
Inverted file model: A database built with the inverted file structure is
designed to facilitate fast full text searches. In this model, data content
is indexed as a series of keys in a lookup table, with the values
pointing to the location of the associated files.
Flat model: The flat model is the earliest, simplest data model. It
simply lists all the data in a single table, consisting of columns and
rows. In order to access or manipulate the data, the computer has to
read the entire flat file into memory, which makes this model inefficient
for all but the smallest data sets.
Felix Mutua, PhD
NoSQL database models
33
The graph database model, which is even more flexible than a
network model, allowing any node to connect with any other.
The multivalue model, which breaks from the relational model by
allowing attributes to contain a list of data rather than a single data
point.
The document model, which is designed for storing and managing
documents or semi-structured data, rather than atomic data.
Felix Mutua, PhD
34 The Relational Data Model
Felix Mutua, PhD
What is Relational Model?
35
Relational Model (RM) represents the database as a collection of relations.
A relation is nothing but a table of values. Every row in the table represents
a collection of related data values.
These rows in the table denote a real-world entity or relationship.
Some popular Relational Database management systems are:
DB2 and Informix Dynamic Server - IBM
Oracle and RDB – Oracle
SQL Server and Access – Microsoft
Postgres
MySQL
ArcGIS Geodatabase
Felix Mutua, PhD
Relational Model Concepts
36
Attribute: Each column in a Table. Attributes are the properties which
define a relation. e.g., Student_Rollno, NAME,etc.
Tables – In the Relational model the, relations are saved in the table
format. It is stored along with its entities. A table has two properties
rows and columns. Rows represent records and columns represent
attributes.
Tuple – It is nothing but a single row of a table, which contains a
single record.
Relation Schema: A relation schema represents the name of the
relation with its attributes.
Felix Mutua, PhD
Relational Model Concepts
37
Degree: The total number of attributes which in the relation is called the
degree of the relation.
Cardinality: Total number of rows present in the Table.
Column: The column represents the set of values for a specific attribute.
Relation instance – Relation instance is a finite set of tuples in the RDBMS
system. Relation instances never have duplicate tuples.
Relation key - Every row has one, two or multiple attributes, which is called
relation key.
Attribute domain – Every attribute has some pre-defined value and scope
which is known as attribute domain
Felix Mutua, PhD
Relational Model Concepts
38
Felix Mutua, PhD
Relational Integrity Constraints
39
Relational Integrity constraints in DBMS are referred to conditions
which must be present for a valid relation. These Relational constraints
in DBMS are derived from the rules in the mini-world that the
database represents.
There are many types of Integrity Constraints in DBMS. Constraints on
the Relational database management system is mostly divided into
three main categories are:
Domain Constraints
Key Constraints
Referential Integrity Constraints
Felix Mutua, PhD
Relational Integrity Constraints
40
Domain constraints specify that within each tuple, and the value of
each attribute must be unique. This is specified as data types which
include standard data types integers, real numbers, characters,
Booleans, variable length strings, etc.
Key Constraints: An attribute that can uniquely identify a tuple in a
relation is called the key of the table. The value of the attribute for
different tuples in the relation has to be unique.
Felix Mutua, PhD
Referential Integrity Constraints
41
Referential Integrity constraints in
DBMS are based on the concept of
Foreign Keys. A foreign key is an
important attribute of a relation
which should be referred to in
other relationships. Referential
integrity constraint state happens
where relation refers to a key
attribute of a different or same
relation. However, that key element
must exist in the table.
Felix Mutua, PhD
Operations in Relational Model
42
Four basic update operations performed on relational database
model are
Insert, update, delete and select.
Insertis used to insert data into the relation
Delete is used to delete tuples from the table.
Modify allows you to change the values of some attributes in existing tuples.
Select allows you to choose a specific range of data.
Whenever one of these operations are applied, integrity constraints
specified on the relational database schema must never be violated.
Felix Mutua, PhD
43 Felix Mutua, PhD
44 Database Design Process
Felix Mutua, PhD
What is Database Design?
45
Database Design is a collection of processes that facilitate the
designing, development, implementation and maintenance of
enterprise data management systems.
Properly designed database are easy to maintain, improves data
consistency and are cost effective in terms of disk storage space. The
database designer decides how the data elements correlate and what
data must be stored.
The main objectives of database design in DBMS are to produce
logical and physical designs models of the proposed database system.
Felix Mutua, PhD
Why Database Design is Important ?
46
It helps produce database systems
That meet the requirements of the users
Have high performance.
Database design process in DBMS is crucial for high performance database
system.
Felix Mutua, PhD
Requirements analysis
47
Planning - This stages of database design concepts are concerned
with planning of entire Database Development Life Cycle. It takes
into consideration the Information Systems strategy of the
organization.
System definition - This stage defines the scope and boundaries of
the proposed database system.
Felix Mutua, PhD
Database Design
48
Logical model - This stage is concerned with developing a database
model based on requirements. The entire design is on paper without
any physical implementations or specific DBMS considerations.
Physical model - This stage implements the logical model of the
database taking into account the DBMS and physical implementation
factors.
Felix Mutua, PhD
Implementation
49
Data conversion and loading - this stage of relational databases
design is concerned with importing and converting data from the old
system into the new database.
Testing - this stage is concerned with the identification of errors in the
newly implemented system. It checks the database against
requirement specifications.
Felix Mutua, PhD
Types of Database Techniques
50
Normalization is a database design technique that reduces data
redundancy and eliminates undesirable characteristics like Insertion,
Update and Deletion Anomalies. Normalization rules divides larger
tables into smaller tables and links them using relationships. The
purpose of Normalisation in SQL is to eliminate redundant (repetitive)
data and ensure data is stored logically.
Entity Relationship Model (ER Modeling) is a graphical approach to
database design. It is a high-level data model that defines data
elements and their relationship for a specified software system. An ER
model is used to represent real-world objects.
Felix Mutua, PhD
51 Entity Relationship Model (ER Modelling)
Felix Mutua, PhD
What is it?
52
Entity Relationship Model (ER Modeling) is a graphical approach to
database design. It is a high-level data model that defines data elements
and their relationship for a specified software system. An ER model is used
to represent real-world objects.
An Entity is a thing or object in real world that is distinguishable from
surrounding environment. For example, each employee of an organization is
a separate entity. Following are some of major characteristics of entities.
An entity has a set of properties.
Entity properties can have values.
Felix Mutua, PhD
ER Diagram Symbols and Notations
53
Felix Mutua, PhD
Entity
54
An entity can be a person, place, event, or object that is relevant to a given
system. For example, a school system may include students, teachers, major
courses, subjects, fees, and other items. Entities are represented in ER
diagrams by a rectangle and named using singular nouns.
Weak Entity: A weak entity is an entity that depends on the existence of
another entity. In more technical terms it can be defined as an entity that
cannot be identified by its own attributes. It uses a foreign key combined
with its attributed to form the primary key. An entity like order item is a
good example for this. The order item will be meaningless without an order
so it depends on the existence of the order.
Felix Mutua, PhD
Attribute
55
An attribute is a property, trait,
or characteristic of an entity,
relationship, or another
attribute. For example, the
attribute Inventory Item Name
is an attribute of the entity
Inventory Item.
Felix Mutua, PhD
Multivalued Attribute
56
If an attribute can have more than one value it is called a multi-valued
attribute. It is important to note that this is different from an attribute
having its own attributes. For example, a teacher entity can have
multiple subject values.
Felix Mutua, PhD
Derived Attribute
57
An attribute based on another attribute. This is found rarely in ER
diagrams. For example, for a circle, the area can be derived from the
radius.
Felix Mutua, PhD
Relationship
58
A relationship describes how entities interact. For example, the entity
“Carpenter” may be related to the entity “table” by the relationship
“builds” or “makes”.
Relationships are represented by diamond shapes and are labeled
using verbs.
Felix Mutua, PhD
Recursive Relationship
59
If the same entity participates more than once in a relationship it is
known as a recursive relationship. In the below example an employee
can be a supervisor and be supervised, so there is a recursive
relationship.
Felix Mutua, PhD
Cardinality and Ordinality
60
These two further defines relationships between entities by placing the
relationship in the context of numbers. In an email system, for example,
one account can have multiple contacts. The relationship, in this case,
follows a “one to many” model. There are a number of notations used
to present cardinality in ER diagrams. Chen, UML, Crow’s foot,
Bachman are some of the popular notations
Felix Mutua, PhD
How to Draw ER Diagrams
61
Identify all the entities in the system. An entity should appear only
once in a particular diagram. Create rectangles for all entities and
name them properly.
Identify relationships between entities. Connect them using a line and
add a diamond in the middle describing the relationship.
Add attributes for entities. Give meaningful attribute names so they
can be understood easily.
Felix Mutua, PhD
Example
62
What are the entities?
What are the
relationships?
What are the
properties?
Felix Mutua, PhD
Example
63
What are the entities?
What are the
relationships?
What are the
properties?
Felix Mutua, PhD
Example 3
64
What are the entities?
What are the
relationships?
What are the
properties?
Felix Mutua, PhD
Example
65
What are the entities?
What are the
relationships?
What are the
properties?
Felix Mutua, PhD
Example
66
What are the entities?
What are the
relationships?
What are the
properties?
Felix Mutua, PhD
Example
67
What are the entities?
What are the
relationships?
What are the
properties?
Felix Mutua, PhD
Why use ER Model?
68
One of the challenges faced when designing a database is the fact
that designers, developers, and end-users tend to view data and its
usage differently.
Communication tools understood by all stakeholders(technical as well
as non-technical users) are critical in producing database systems that
meet the requirements of the users. ER models are examples of such
tools.
ER diagrams also increase user productivity as they can be easily
translated into relational tables.
Felix Mutua, PhD
Assignment 03
69
Each student to research on a computer-based system of choice for a database
design (get ideas from here). You are required to:
1. Research on the allocated system and identify one potential client (hypothetical)
2. Develop the user requirements for this client
3. List the database entities, their attributes and relationships
4. Develop the database design using ER (use https://www.diagrams.net/ )
The user requirements will be presented in a small report– max 4 pages with no
cover page. Page 1-3 : requirements and page 4: the ER diagram. Total submission
4 pages in PDF
Deadline Next Week. https://forms.gle/vzYg4iEBbjMKXWUu7
Felix Mutua, PhD