03
Databases
IT3010 Data management FOR business analytics
REMINDER:
NO CLASS NEXT WEEK
3
QUICK RECAP
Organisations store data, but where?
Storing data for different purposes in different files is highly problematic. It
makes managing the consistency, constraints and access to the data very difficult.
It also creates redundant copies of the data (which can also lead to
inconsistency). These affect the quality of the data we use for decision making.
Sales Department Accounting Department
Accounts Accounts
Quotation Order Generation
Receivables Payable
Program Program
Program Program
Order Fulfillment Invoice Management
System System
Customer Inventory Orders Customer Inventory
Master File File Master Pricing
File File File
4
QUICK RECAP
So we use databases.
Databases avoid all the data problems associated with file storage. They provide
a (usually) single source/repository for organisational data. Modern enterprise
systems (ES) are built on a central database, and that is where data resides.
Sales Department Accounting Department
Accounts Accounts
Quotation Order Generation
… Receivables Payable …
Applications Applications
Applications Applications
Enterprise System
(Central database)
https://hbr.org/1998/07/putting-the-enterprise-into-the-enterprise-system
5
QUICK RECAP
About databases
A database is a collection of logically related information. ‘Logically related’ means
the data represents the relationships between real world entities (eg departments in
a business process). This data usually stored in tables (in a relational database).
Today, we talk
more about
Enterprise System
tables. (Central database)
- - - - -
- - - - -
- - - - - - - -
- - - - -
- - - - - - - - - -
- - -
- - - - - - - -
- - - - - - - - - - - -
- - - - - - - - - - - -
- - - -
6
QUICK RECAP
DBMS - the software that manages databases
A database management system is a software system that manages databases. It
decides how best data should be stored physically (as bits on a disk), and allows us to
look at the data, or subsets of the data, easily. It controls access to the data, making sure
people and applications cannot corrupt, or make inconsistent, any data. At the same
time, it gives access to the data to multiple concurrent users and applications.
USERS AND
APPLICATIONS
VIEW
LOGICAL
DBMS
PHYSICAL
7
READINGS
Davenport, T. (1998) “Putting the Enterprise into
the Enterprise System”, hbr.org, July-August.
How organisations moved from isolated data to
centralised databases, and how those
databases capture and change processes,
structure and strategy.
Database System Concepts
Chapter 1: Introduction
Chapter 2: Introduction to the Relational Model
8
WHAT ARE WE DOING TODAY?
COMPLEXITY
DESIGNING DATABASES
THE RELATIONAL data MODEL Creating a good structure for our data
to be stored.
How is data structured? How do we decide how the
data should be stored? What conditions do we need to
follow, if we wanted to create our own database?
TIME
THE RELATIONAL MODEL
10
THE RELATIONAL (DATA) MODEL
Most databases today rely on it
A data model is a high level conceptual paradigm that describes how data
should be structured, so that it reflects how the data elements relate to
one another.
Organisational data represents real life, and so the data model often captures the
flow of data in business workflows and processes.
There are several common data models, but the relational model is the
predominant one as far as databases are concerned. According to the
relational model, data should be captured as records, in two-dimensional
tables that represent both the data and the data’s relationships. These
tables are known as relations. The rows correspond to individual records,
the columns correspond to their attributes.
11
THE RELATIONAL (DATA) MODEL
Terminology
COLUMNS or
RELATION or TABLE
ATTRIBUTES or
NAME FIELDS
STUDENT
Stu No First Name Last Name Age CAP Programme Faculty
12 John Blue 23 4 BZA SOC
13 Lee White 44 4.95 ISC SOC ROWS or
RECORDS or
14 Judith Green 31 4.1 BIZACC BIZ TUPLES
15 Robert Red 24 3.6 DEN DEN
Each table will contain only a particular type of record, dictated
by the table’s fields, which are fixed in number.
12
RULES OF THE RELATIONAL MODEL
These seven rules must be preserved
Relations must have the following properties:
The relation has a name that is distinct from all other relation names
in the relational schema
Each cell of the relation contains exactly one atomic (single) value
Each attribute has a distinct name
The values of an attribute are all from the same domain
Each tuple is distinct; there are no duplicate tuples
The order of attributes has no significance
The order of tuples has no significance
LET’S TRY TO DESIGN A TABLE
14
DESIGNING A TABLE
We have a group of suppliers under contract. Each supplier has a
name, a rating on their reliability and a headquarter city location.
These suppliers ship us parts. Each type of part has a name, a colour,
a weight, and a city location where it is currently stored.
In a shipment, one supplier ships one part in a quantity that is a
multiple of 100.
What could we store this data?
15
EACH TUPLE MUST BE UNIQUE
How do we ensure this?
Each tuple should be uniquely identifiable by a set of one or more attributes. This set is
called a key.
Super key: A set of attributes that uniquely identifies a tuple within a relation.
Candidate key: A super key such that no proper subset is a super key within the relation.
Composite key: A candidate key with more than one attribute
Primary key: A candidate key selected to uniquely identify tuples
Alternate key: A candidate key not selected as primary key
Stu No First Name Last Name Age CAP Programme Faculty
12 John Blue 23 4 BZA SOC
13 Lee White 44 4.95 ISC SOC
14 Judith Green 31 4.1 BIZACC BIZ
15 Robert Red 24 3.6 DEN DEN
16
HOW DO YOU DO THIS IN SQL?
17
ANOTHER EXAMPLE
Let’s capture an old paper invoice into digital data
18
NORMALISATION AND ER MODELLING
Figuring out what goes into a relation, optimised
Normalisation, a bottom up approach
Aims to remove redundancy of data, therefore reducing storage space
required and speeding up searches
Occurs progressively using a set of rules (normal forms)
Entity-relationship modelling, a top-down approach
Identifies objects/entities in a system, then maps their relationships
Both lead to a design that can be made into a physical database
Then the tables can be created in the database and populated with data
19
NORMALISATION: 1NF
First Normal Form
The domain of an attribute must include only atomic values, and the value of
any attribute in a tuple must be a single value from the domain of that attribute.
ie. disallows multi-valued attributes
What to do if you’re not in 1NF
If a max number of values for an attribute is known, replace the attribute with
atomic attributes (eg. Qty1, Qty2 since only 2 items allowed). If not, remove the
violating attribute and place it in a separate relation, and propagate the primary
key
Every relation should at least be in 1NF.
20
NORMALISATION: 2NF
Second Normal Form
If a relation exists where a primary key contains multiple attributes,
all non-key attributes that are dependent on part of the primary key
should be separated.
What to do if you’re not in 2NF
Decompose the relation, and create a new relation with the partial
key and its dependent attributes. Keep a relation with the primary
key and attributed dependent on it.
21
NORMALISATION: 3NF
Third Normal Form
If a relation exists where a non-key attribute can be determined by
another non-key attribute (or set of non-key attributes), it is not in
3NF.
What to do if you’re not in 3NF
Decompose the relation, and separate the non-key dependency in
another relation
Also remove any attributes that can be computed
22
TRY AGAIN
We have a group of suppliers under contract. Each supplier has a
name, though that’s not necessarily unique, a rating on their
reliability and a headquarter city location. These suppliers ship us
parts. Each type of part has a name, a colour, a weight, and a city
location where it is currently stored. In a shipment, one supplier
ships one part in a quantity that is a multiple of 100. Design the
shipment database.