Faculty of computing and information management
DBIT 0301: DATABASE MANAGEMENT SYSTEMS
Lecturer: Deborah Mbula
Topic 3b: Database design
Definition
Conceptual design
Logical design
Physical design
Definition
The process of creating a design for a database that will support the enterprise’s operations and
objectives. Database design is performed in three phases (stages) namely conceptual design,
logical design and physical design.
Phase 1: Conceptual design
This is the process of constructing an abstract model of data to be included in the database. This
model should be independent of any database management system (DBMS) and other physical
considerations.
Steps followed
a) Identification of entities
b) Identification of relationships
c) Construction of conceptual data model.
a) Identify entities
It is a thing or object of importance about which data must be captured.
To identify entities in the requirements, map nouns to entities
Database entities appear in a data model as a rectangle with a title.
The title (in upper case) is the name of the entity.
DEBORAH
Types of entities
i. Weak entity: This is an entity that does not have sufficient attributes to form a primary
key (unique identifier). The existence of a weak entity is indicated by a double rectangle
in the ER diagram.
Weak entity
ii. Strong entity: This is an entity that has a primary key (unique identifier).A weak entity
is existence dependent.
Strong entity
iii. Recursive Entity: This is an entity in which a relationship can exist between
occurrences of the same entity. This occurs in a unary relationship. For instance, a doctor
can be a patient. They are represented as follows:
T
r
DOCTOR
e
at
iv. Composite Entities: An entity created to resolve Many to Many relationship to one-to-
many. It is also referred to as a bridge entity. Bridge entity is composed of the primary
keys of each of the entities to be connected. A composite entity is represented by a
diamond shape within a rectangle in an ER Diagram.
Remarks:
Generalization: The process of defining a more general entity type from a set of more
specialized entity types. BOTTOM-UP
Specialization: The process of defining one or more subtypes of the supertype and forming
supertype/subtype relationships. TOP-DOWN
b) Identification of relationships
A relationship is a meaningful association between entity types.
Classifying relationships:
i. Degree
ii. Cardinality
iii. Optionality
1. Classification according to degree
Degree: The number of entity types participating in a relationship.
i. Unary (recursive) relationship- One entity type participates in a relationship.
DOCTOR DOCTOR
Tr
e
DEBORAH at
ii. Binary relationship- Two entity types participate in a relationship.
DOCTOR PATIENT
Tr
iii. Ternary relationship:
e Three entity types participate in a relationship.
at
PHARMICST
PRESCRIPTION
Is
s
PATIENT
u
… e
iv. n-ary relationship- n entity types participate in a relationship.
2. Classification according to cardinality(connectivity)
In a relationship, cardinality refers to the number of entity occurrences involved in the
relationship. Also defined as minimum and maximum number of entity occurrences that can
participate in a relationship.
Examples
i). One-to-one relationship (1:1): Each entity occurrence in entity type A can only have one
matching entity occurrence in entity type B and vice versa.
a1 b1
a2 b2
a3 b3
a4 b4
ii). One-to-many relationship(1:m): An entity occurrence in entity type A can have several a
matching entity occurrences in entity type B, but each entity occurrence in entity type B
can only have a maximum of one entity occurrences in entity type A.
a1 b1
a2 b2
a3 b3
a4 b4
iii). Many-to-many relationship(m:n)
An entity occurrence in entity type A can have several matching entity occurrences in
entity type B and vice versa.
a1 b1
a2 b2
a3 b3
a4 b4
3. Classification according to optionality(participation)
DEBORAH
Optionality refers to the minimum number of entities in a relationship. Can be mandatory or
optional
Examples
Mandatory: There must be an entity occurrence participating in the relationship
Optionality: There may be an entity occurrence participating in the relationship.
To show optionality, put a circle or `0' at the `optional end' of the relationship.
c) Building conceptual data model
The data model that results after conceptual design is called conceptual data model
Choose a notation and design the data model. Examples of notations are chen’s notation , Rein85
notation, Crow’s foot notation and IDEF1X notations. Most of the differences concern how
relationships are specified and how attributes are shown. In almost all variations, entities are
depicted as rectangles with either pointed or rounded corners. The entity name appears inside.
Relationships can be displayed as diamonds or can be simply line segments between two entities.
For Relationships, there is need to convey relationship name, degree, cardinality and optionality
where necessary. Any notation selected must be used consistently.
Chen’s notation Crow’s foot Rein85 IDEF1X
Relationship Name: Relationship Relationship Relationship
Displayed just inside the name: name: name:
relationship diamond. Relationships name Relationships name Relationships
Cardinality: The indicated next to entity indicated next to name indicated
cardinality is indicated by above or below the relationship next to relationship
placing the appropriate link. diamond. diamond.
numbers beside the Cardinality: Cardinality: Cardinality:
entities using the format Crow's feet are used to Not shown except Not shown except
(x,y), which represent the show a many side of a the connectivity. the connectivity.
minimum and maximum relationship. A single Optionality: Optionality:
value, respectively. line indicates a one Optional Optional
Optionality: side of the relationship. participation is participation is
Mandatory participation Optionality: shown with an open shown with an
indicated by one/ two Optional participation circle next to open circle next to
line(s) linking the entity to is shown with an open optional end. optional end.
the diamond next to circle next to optional Mandatory Mandatory
mandatory end. Optional end. Mandatory participation is participation is
participation indicated by participation is shown shown with shown with
a circle near optional end. with two vertical lines. continuous link line continuous link
next to mandatory line next to
end. mandatory end.
DEBORAH
Comparison of ERD notations.
a) Chen’s notation
Chen model moved conceptual modeling into the practical database design arena by establishing
basic building blocks: entities and relationships. Dominant player in the CASE tool market
during the 1980s and early 1990s.
b) Crow’s Foot notation
This model combines connectivity and cardinality information in a single symbol set.
Popularized by the Knowledgeware modeling tool. Cardinality is limited to 0,1 or N.
c) Rein85 notation
This model is based on the same modeling conventions as the Crow’s Foot model, it’s symbols
are different. It does not recognize cardinalities explicitly, relying on connectivities to lead to
logical cardinality conclusions
d) IDEF1X notation
This is a derivative of the integrated computer-aided manufacturing (ICAM) studies of the late
1970s. Became the source of graphical methods for defining the functions, data structures and
dynamics of manufacturing businesses. The integration of these methods became known as IDEF
(ICAM Definition). Hughes Aircraft developed the original version named IDEF1. Later, the
extended version was named IDEF1X.
Example 1: Chen’s notation
DOCTOR PATIENT MEDICATIO
1
tr ta N
(0, e (1, (0, ke (1,
N) at 1) N) s 1) M
The above model
1
will
be read as follows:
M
Each DOCTOR must treat one or more PATIENTS. Each patient must be treated by a doctor.
Each PATIENT may take one or more MEDICATIONS. Each MEDICATION is given to
one PATIENT.
Example1: Crow’s foot notation.
Treat Takes
DOCTOR PATIENT
MEDICATION
The above model will be read as follows:
DEBORAH
Each DOCTOR must treat one or more PATIENTS. Each patient must be treated by a doctor.
Each PATIENT may take one or more MEDICATIONS. Each MEDICATION is given to
one PATIENT.
Example1: Rein85
Treat
Takes
DOCTOR PATIENT
MEDICATIO
N
The above model will be read as follows:
Each DOCTOR must treat one or more PATIENTS. Each patient must be treated by a doctor.
Each PATIENT may take one or more MEDICATIONS. Each MEDICATION is given to
one PATIENT.
Example 1 : IDEF1X
The above model will be read as follows:
Each DOCTOR must treat one or more PATIENTS. Each patient must be treated by a doctor.
Each PATIENT may take one or more MEDICATIONS. Each MEDICATION is given to
one PATIENT.
Treats Takes
DOCTOR PATIENT
MEDICATION
The features of the conceptual data model include:
i. Important entities and the relationships among them.
ii. No attribute is specified.
iii. No primary key is specified.
Example: A supplier receives orders for products. A SUPPLIER may receive one or more
ORDERS. Each one or more ORDERS must be send to the SUPPLIER. Each order consists of
one or several products. But each PRODUCT is belonging to one ORDER.
Receives
SUPPLIER PRODUCT
has
ORDER
Figure 1: Conceptual data model
DEBORAH
From the figure above, the conceptual data model is the entities that describe the data and
the relationships between those entities.
Phase 2. Logical database design
The process of constructing a database based on specific database management system (DBMS)
and database model. For example a relational Database model. It considers how the data are
represented using the structures offered by the DBMS:
Steps followed:
Select database model.
Map Entity-Relationship Diagrams(from conceptual data model into logical model)
Identify the attributes
Normalize database
Design the ERD (logical data model) and validate with the users.
Create a data dictionary
i). Selecting database model
A database model is a collection of concepts and rules for the description of the structure of the
database. Some of the models one can choose are flat file database, hierarchical, network,
relational and object oriented.
ii). Map Entity-Relationship Diagrams
-Identify all the entities and relationships as captured in conceptual data model.
iii). Identify attributes
Attributes are explicit values such as a quality or characteristic associated with
someone or something (entity).
They represent characteristics or values about a database relation (an entity)
Types of attributes
i. Key attributes: Attributes which uniquely identify an instance of an entity (identifier).
These attributes are distinct for each individual entity. For example a student Number or
Car registration number etc.
ii. Non key attributes: These are attributes which describe a non-unique characteristics of
an entity instance (descriptors). Examples year, color, model.
iii. Required attributes: These are attributes which must have a value or a value must be
known for each entity occurrence. Example employee number.
DEBORAH
iv. Optional attributes: These are attributes which could have a value or a value may be
known for each entity occurrence. Example an employee; employees spouse is optional
since not every employee will have a spouse.
v. Composite attribute: This is an attribute which can be divided into smaller subparts.
These subparts represent basic attributes with independent meanings of their own. For
example, take Name attributes. We can divide it into sub-parts like First_name,
Middle_name, and Last_name.
vi. Simple attribute: This is an attribute that cannot be divided into subparts. It is therefore
said to be simple or atomic attribute. For example, Employee Number is a simple
attribute. Age of a person is a simple attribute.
vii. Single-valued attribute: It is an attribute that can have single value at a particular
instance of time. For instance, a person cannot have more than one age value.
viii. Multi-valued attribute: It is an attribute that can have more than one value at one
time. For instance, a person can have more than one degree at one time, therefore
attribute degree is said to be a multi-valued attribute.
ix. Stored attribute: An attribute that supplies a value to the related attribute.
x. Derived attribute: An attribute whose value is derived from a stored attribute. These
attributes are usually created by a formula or certain operation on other attributes.
Example: Age= [Current-year]- [year_of_birth]
xi. Candidate Key: An attribute or set of attributes that uniquely identifies individual
occurrences of an entity type.
xii. Primary Key: An attribute that is used to identify an entity type.e.g. entity type student
can be identified by student number.
xiii. Composite Key: A candidate key that consists of two or more attributes.
e.g. VoterID+NationalID+PassportID
iv). Normalization of the database
Data is organized into database tables by using normal forms rules or conditions. These rules
help to make a good database design. Usually, data is organized using first, second and third
normal forms.
Begin with a list of all of the fields that must appear in the database.
Do not include computed fields
You can get this information from a printed document used by the system e.g. reports.
Additional attributes besides those for the entities described on the document can be added to
the database.
v). Design the ERD (logical data model) and validate with the users.
In this step we use the normalized forms and standard notations to draw the ERD. Such notations
include Rein95, IDEFl1X, Chen, Crow’s foot among others.
DEBORAH
Chen Crow’s Foot Rein85 IDEFI1X
Entity
Relationship line
Relationship
Optional symbol
One(1) symbol 1
Many(M) symbol M
Composite entity
Weak entity
A logical data model describes the data in as much detail as possible, without regard to how they
will be physically implemented in the database. A logical data model is used to explore the
domain concepts, and their relationships, of the problem. This could be done for the scope of a
single project or for the entire enterprise. The logical data model depict the logical entity types,
typically referred to simply as entity types, the data attributes describing those entities, and the
relationships between the entities.
Features of a logical data model include:
Includes all entities and relationships among them.
All attributes for each entity are specified.
The primary key for each entity is specified.
Foreign keys (keys identifying the relationship between different entities) are specified.
Normalization occurs at this level.
PRODUCT
SUPPLIER
ProductID
SupplierID (PK)
(PK)
ProductName
SupplerName ORDER
Quantity
Address
OrderID (PK)
DEBORAH OrderID(FK)
Street
OrderName
DatePrepared
Street
SupplierID(FK)
Figure 2: Logical Data model using crow’s foot.
Logical data model versus Conceptual data model
In a logical data model, primary keys are present, whereas in a conceptual data model, no
primary key is present.
In a logical data model, all attributes are specified within an entity. No attributes are specified in
a conceptual data model.
Relationships between entities are specified using primary keys and foreign keys in a logical data
model. In a conceptual data model, the relationships are simply stated, not specified, so we
simply know that two entities are related, but we do not specify what attributes are used for this
relationship.
Phase 3: Physical database design
This is a process of constructing a model of information used in an organization based on
physical considerations like software and hardware. Physical design is concerned on how the
database is organized as files and what kind of structures to use for efficiency of database
processing
Steps:
• Select DBMS
• Select storage devices
• Determine access methods
• Design files and indexes
• Determine database distribution
• Produce the physical model and validate with the users
i). Select DBMS
Factors considered when selecting the DBMS
Costs
Features and Tools
DEBORAH
Underlying model
Portability
DBMS hardware requirements
Security
Usability
Main steps to selecting a DBMS:
Define terms of reference of study
Shortlist two or three products
Evaluate products
Recommend selection and produce report.
Examples of DBMS
Microsoft access
Microsoft SQL server
Oracle
Informix
MySQL
Functions of DMBS
Database creation (storing data in a defined database) using data definition language
Retrieval (query and reporting) using data manipulation language.
Update (Changing the contents of the database) using data manipulation language.
Programming user facilities for system development.
Database revision and restructuring using data manipulation language.
Database integrity control using data definition language
Performance Monitoring
Concurrency control; controlling simultaneous access to the database
ii). Select storage devices
When selecting storage devices consider:
Speed with which data can be accessed
Cost per unit of data
Reliability: loss on power failure or system crash and physical failure of the storage
device
Media
Cache: Fastest and most costly form of storage; volatile; managed by the computer system
hardware.
Main memory: Fast access but generally too small (or too expensive) to store the entire
database.
Flash memory: Data survives power failure. Data can be written at a location only once, but
location can be erased and written to again. Reads are roughly as fast as main memory. But
writes are slow (few microseconds), erase is slower.
Magnetic-disk: Data is stored on spinning disk, and read/written magnetically. It is a
primary medium for the long-term storage of data; typically stores entire database. Data must
be moved from disk to main memory for access, and written back for storage. It is possible
DEBORAH
to read data on disk in any order, unlike magnetic tape. Survives power failures and system
crashes. However, disk failure can destroy data.
Optical storage: This is a non-volatile, data is read optically from a spinning disk using a
laser. Reads and writes are slower than with magnetic disk.
Tape storage: Is a non-volatile storage, used primarily for backup (to recover from disk
failure), and for archival data. Allows sequential-access and thus much slower than disk.
Very high capacity.
iii). Determine access methods
In this step identify specific structures and access methods for the data to achieve optimum
performance for the database system. The access method depends on storage structure.
Some of the access methods are:
Sequential
Indexed sequential
Hashing
iv). Design files and indexes
Creating set of tables and constraints on these tables from the information given in logical data
model. The tables are designed and their field properties specified.
SUPPLIER TABLE
FieldName Data type Length
SupplierID Number 8
SupplierName Text 50
Address Text 50
Street Text 50
DEBORAH
ORDER TABLE
FieldName Data type Length
OrderID Text 50
OrderName Text 50
Dateprepared Date/Time
SupplierID Number 8
PRODUCT TABLE
FieldName Data type Length
ProductID Text 30
ProductName Text 60
Quantity Number 8
OrderID Text 50
v). Determine database distribution
There is need to consider location and number of users when deciding the database distribution.
For instance the database may single user database in which case the database can be stored in a
local computer, multiuser database may stored in a dedicated server where users can access
simultaneously. If the users are not within the same location and the organization has several
branches in different locations, then a distributed database may be used.
Physical data model
Physical data model represents how the model will be built in the database.
A physical database model shows all table structures, including column name, column data type,
column constraints, primary key, foreign key, and relationships between tables.
Features of a physical data model include:
i. Specification of all tables and columns.
ii. Foreign keys are used to identify relationships between tables.
iii. De-normalization may occur based on user requirements.
iv. Physical considerations may cause the physical data model to be quite different from
the logical data model.
v. Physical data model will be different for different RDBMS. For example, data type
for a column may be different between MySQL and SQL Server.
The steps for physical data model design are as follows:
DEBORAH
i. Convert entities into tables.
ii. Convert relationships into foreign keys.
iii. Convert attributes into columns.
iv. Modify the physical data model based on physical constraints / requirements
SUPPLIER
1 PRODUCT
SupplierID(PK): Number ProductID (PK): Text(30)
∞
SupplerName: ProductName : Text (60)
Text(50)
Quantity :Number
Address:
Text(50) OrderID(FK) :Text(50)
Street:
ORDER
Text(50)
OrderID (PK) : Text(50)
OrderName : Text(50)
DatePrepared : 1
Date/Time
∞
SupplierID(FK):Number
Figure 3: Physical data model
Logical data model versus Physical data model
Comparing the logical data model shown above with the logical data model diagram, we see the
main differences between the two:
i. Entity names are now table names.
ii. Attributes are now column names.
DEBORAH
iii. Data type for each column is specified. Data types can be different depending on
the actual database being used.
Revision questions
a) Describe four functions of a database management system (DBMS). [8 Marks]
b) Describe the three levels of database architecture. [6 Marks]
c) Discuss the logical and physical data independence. [4 Marks]
d) Define the following attributes as used in relational database design. [6 Marks]
i). Multi-valued attribute
ii). Composite attribute
iii). Derived attribute
e) Identify at least three attributes for each of the following entities below and choose one of
your attributes as the identifier for the entities.
i). Customer. [3 Marks]
ii). Organization. [3 Marks]
f) Discuss the differences in the following data modelling notations.
i). Chen
ii). IDEF1X
iii). Rein85
iv). Crow’s foot
g) Describe the process of physical database design. [8 Marks]
h) Briefly explain the following database storage devices.
i). Optical storage
ii). Magnetic disk
iii). Tape storage [6 Marks]
i) Explain the following database access methods.
i). Hash
ii). Indexed sequential access [6 Marks]
DEBORAH