0% found this document useful (0 votes)
22 views140 pages

Unit 1 Tutorials Relational Database Model

Uploaded by

parrishd2015
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views140 pages

Unit 1 Tutorials Relational Database Model

Uploaded by

parrishd2015
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 140

Unit 1 Tutorials: Relational Database

Model
INSIDE UNIT 1

Utility

Database Purpose
Databases in the Real World
Databases vs. Flat Files
Parts of a Database System
Non-Relational Databases
Transactional and Analytical Databases

Common Database Models

Data Model Innovations


Hierarchical and Network Data Models
Relational Models
Object and Relational Models
Conceptual Design
Entity Relationship Model
Logical Design
Physical Design

Commercial Variety

Commercial Databases
Using ANSI SQL
Using SQLite
Using MySQL and MariaDB

Constraints

Considering Business Rules

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 1
Translating Business Rules
Storage Limitations
Query Processing Bottlenecks
Database Migration

Database Purpose
by Sophia

 WHAT'S COVERED

In this lesson, you will learn that databases are a core component of business, science, e-commerce,
and cloud computing. Databases are everywhere in our world today. This lesson explores a few of the
terms and definitions related to understanding databases, in two parts. Specifically, this lesson will
cover:
1. Databases Are Everywhere
2. Data, Metadata, Information, and Knowledge

1. Databases Are Everywhere


Organizations employ databases to efficiently and effectively manage vast amounts of data. A database is a
structured and organized collection of data that is stored electronically. For instance, databases utilize business
decision support systems (DSS) to process extensive data and gain insights into product and service trends.
This same concept applies when overseeing inventory for a store or analyzing statistics for fantasy football
players.

EXAMPLE Amazon.com is a prime example of this approach. They utilize their sales data to make
informed decisions about the products they should offer directly. Additionally, by leveraging third-party
sales data, Amazon can uncover new products and ideas they might not have discovered otherwise.

 BIG IDEA

The database provides a structured and centralized method for storing and organizing large amounts of
data. Information can be managed and retrieved efficiently using databases, as they store a wide range of
data types, including text, numbers, images, and multimedia. Querying and searching the database can
retrieve specific data quickly and efficiently. One common method for analyzing and contextualizing data is
to use Structured Query Language (SQL), a programming language used for displaying, sorting, filtering,

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 2
and summarizing data in a database. Collectively, all those activities are known as querying. Querying a
database using SQL is one of a database administrator's key responsibilities. Later in this course you will
learn how to use SQL and practice it using a sample database.
In order to maintain data integrity, databases enforce constraints and rules that prevent inconsistent or invalid
data from entering the database. The anomalies prevented can include data duplications, losses, or conflicts.

Collaboration and data sharing are possible through databases since they allow multiple users to collaborate
and share data. Multiple users can work on the database and update it simultaneously since they provide
concurrent access to data. This enhances productivity in organizations, encourages teamwork, and reduces
data redundancy.

Databases provide various features and mechanisms to enforce data security and protect sensitive information.
Encryption, access control mechanisms, and auditing capabilities help ensure that only authorized individuals
can view or modify data. Data loss and system failure are also protected by backups and recovery mechanisms
provided by databases.

In large organizations with many data sources, a database system can serve as a centralized repository that
enables the organization to ensure that all relevant data is available from one information platform. It can also
enforce data consistency and integrity across the enterprise.

Software developers often need to write programming code that can get data from a database, perform actions
on it, and then write it back to the database. Because that is such a common need across diverse programming
projects, developers often rely on application programming interfaces (APIs), which are sets of rules, protocols,
and tools that allow different applications to communicate with each other.

 REFLECT

Think about your everyday experience and how often you interact with companies that use databases, like
banks, the grocery store, the local pet store, Amazon, and schools, as well as social media sites like
Instagram and TikTok. Also, consider how often you personally consume and produce data by uploading
content, shipping, or depositing money from your paycheck.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 3
 KEY CONCEPT

Data management is a discipline that requires focus on the creation, storage, and retrieval of data.
There’s a lot of planning involved to ensure that data and information is accurate and usable. Data management
is a discipline that requires focus on the creation, storage, and retrieval of data. There’s a big focus on data
management in most organizations and companies because ensuring that data is handled correctly is critically
important to operations. As a database administrator, you will need to know how to manage data and ensure
that the data is suitable for its intended use.

Database management systems are collections of programs that help manage database structures and control
access to the data in the database. Some databases use a graphical user interface (GUI), which is a type of
user interface through which users interact with electronic devices via visual indicator representations to make
managing them easier. PostgreSQL managers can use pgAdmin to help manage and run their databases.

 TERMS TO KNOW

Database
A computer structure that holds a collection of related data.

Structured Query Language (SQL)


A specialized programming language used for managing and manipulating relational databases.

Application Programming Interface (API)

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 4
A set of rules, protocols, and tools that allow different software applications to communicate with each
other.

Graphical User Interface (GUI)


A type of user interface through which users interact with electronic devices via visual indicator
representations.

2. Data, Metadata, Information, and Knowledge


Data refers to raw, unprocessed facts and figures, typically organized in a structured format within a database.
Data on its own lacks context and meaning. It represents the most basic level of information.

A database organizes data using metadata, which is data about data. Examples of metadata include the tables,
the field names, and characteristics within each table, and the relationships between tables.

Information is data that has been processed and given context, making it more meaningful and useful. SQL
commands such as ORDER BY or SELECT FROM help database administrators contextualize data to create
information.

Knowledge is a higher-level abstraction that goes beyond information. It represents the understanding, insights,
and experience derived from information and data. Knowledge may involve interpreting the meaning of
information, recognizing patterns, and making strategic decisions based on the information available.

 TERMS TO KNOW

Data
Raw, unprocessed facts and figures, typically organized in a structured format within a database.

Metadata
Data about data, which describes characteristics or relationships of the data.

Information
Data in a context that gives it meaning.

Knowledge
Information or data that has been organized, stored, and made accessible in a structured manner.

 SUMMARY

In this lesson, you learned that databases are everywhere and exist throughout our modern world,
used by various companies and organizations. Data, metadata, information, and knowledge are
important terms to know when it comes to databases. You learned that data comprises the raw facts
stored in a database. Information arises when that raw data is given context that adds meaning.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 5
Knowledge consists of understanding, insights, and predictions made by thinking critically about the
available information.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Application Programming Interface (API)


A set of rules, protocols, and tools that allow different software applications to communicate with
each other.

Database
A structured and organized collection of data that is stored electronically.

Graphical User Interface (GUI)


A type of user interface through which users interact with electronic devices via visual indicator
representations.

Information
Data in a context that gives it meaning.

Knowledge
Information or data that has been organized, stored, and made accessible in a structured manner.

Metadata
Data about data, which describes characteristics or relationships of the data.

Structured Query Language (SQL)


A specialized programming language used for managing and manipulating relational databases.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 6
Databases in the Real World
by Sophia

 WHAT'S COVERED

In this lesson, you will explore real-life scenarios of how databases are used in businesses. People and
companies interact with many databases every day. People and customers will interact with databases
to get information about products or services, often unaware of the systems they are interacting with.
Businesses interact with databases to provide data, get information, make decisions, and keep track of
payments in and out. Sometimes when a business is rolling out a new technology, it must decide
whether to create its own database, expand an existing one, or avoid databases altogether. You will
learn about interacting with a database, in two parts. Specifically, this lesson will cover:
1. Daily Interactions With Databases
2. To Build or Not to Build

1. Daily Interactions With Databases


You interact with data and databases throughout your life. Suppose you ordered a product online, used a
streaming service to watch a movie, played a video game, used a cloud storage to save or back up files, played
fantasy football, interacted with a bank, or made payments using a debit or credit card; in each case, you have
interacted with a database. AI (artificial intelligence), social media, and e-commerce are the biggest systems
using databases today.

Let’s walk through a day with Carlos and see how he uses a database in just half of a day.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 7
IN CONTEXT

Carlos wakes up in the morning and asks Alexa to play his morning song list. (Alexa uses machine
learning to know what Carlos likes and stores that data in the cloud for later retrieval, as well as
breaks down each of Carlos's words and matches them against a database of sounds so it knows
what Carlos is saying.)

Carlos opens his phone and checks his email (email is stored in a type of database) and his TikTok
channel (TikTok uses MySQL, Mongo, ByteGraph, and other databases) for new information. He looks
at his Twitch channel (PostgreSQL and others) to see who did the last game run and how it went. Then
he gets ready to go to work.

Carlos goes to his car, looks up his driving route to see what traffic looks like on Google Maps
(Bigtable) and then heads off to work. On the way to work, he asks Siri to order coffee for him at the
local coffee shop (Oracle Exadata database) so he can swing through the drive-through and get his
favorite coffee. Google Maps will automatically update his route to work for him and give him a new
traffic map.

Once at work, Carlos starts to work on customer orders (the company uses PostgreSQL) and
coordinating global shipping for orders. On his cell phone, he gets a reminder that he has an online

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 8
test at 11:00 AM for his math class using the school's learning management system (LMS). He
acknowledges the alert, saying he will be online then and ready for the test (PostgreSQL).

At 10:50, he logs into the school and goes to the class to prepare for the math test. He told his boss he
needed to take the test (Outlook Calendar database) by blocking off the hour. The LMS takes his login
and lets him know the test is ready early. Does he want to take it now?

At 11:55, he completes the test and gets instant feedback that he passed the test with a 92%
(PostgreSQL). Carlos takes a picture of the passing grade and posts it to Instagram (PostgreSQL,
Redis, and Memcache) to let his friends and family know he passed the test.

This is just part of Carlos’s day, and he has interfaced with many different kinds of databases throughout his
morning. Databases and their design are structured by the companies that make the software and provide the
service to people. Carlos's typical morning reflects how databases have influenced many systems we use and
how we interact with others globally.

 THINK ABOUT IT

As a fun exercise, try to map out where you may have interacted with a database today. Are there apps that
you use or games that you play that track your progress through the game, or apps that track your progress
through their program? How does your news get delivered to you? What about the messages that you send
to your friends and family? You’ll probably be surprised at how many different kinds of databases you work
with every day.

2. To Build or Not to Build


Businesses must consider many things when building a database. They must evaluate the database's desired
performance and the database capabilities to solve the business problem. They must also determine when to
build a database, and when it is unnecessary to have one. In our example, we showed that Carlos uses many
databases. However, some of those same systems utilize cloud storage for media and databases to store other
information. Databases should be considered if many users interact with them, especially if some are customers
rather than employees. This is because databases have very effective security and privacy built into them.
Some users may only need to read data in the database, while others may also have to insert, update, and
delete data from the business database.

If a company already has a database, a separate new database may not be needed. However, the existing
database must have the hardware and software capacity for the addition.

EXAMPLE If a company had an e-commerce site with a database that stores user names and email
addresses, it would not make sense to build a separate new database to track newsletter signups from its
customers. Adding a field or table to the existing database would make more sense to collect email
addresses as part of the newsletter signup process. That way, they could track the users who sign up for
the newsletter and those who do not.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 9
Although the company in that example could certainly add its newsletter mailing list to its existing database,
that might not be the best solution for all companies. For example, if the current database is nearing its
maximum capacity in terms of data storage space or processing power, or if it cannot be modified to meet the
new use, setting up a separate new database in parallel to the original one may make sense. The new database
could be set up to pull data from a certain table in the existing database to avoid redundancy.

A company that does not currently have a database—or a database administrator to create one—may prefer to
outsource its newsletter operations to a third-party service that can manage recipient names and addresses
and send the newsletters out to them. The cost to develop a database, or to add additional capabilities to an
existing database, may be greater than the cost of a subscription to a third-party service.

Depending on the size and scope of the project, a database may not be required at all. For a very small mailing
list, a company might get along just fine maintaining the list in a spreadsheet and using the Mail Merge feature
in an email app such as Microsoft Outlook to mail out a monthly newsletter. Such a spreadsheet would be
considered a flat file, and that's the subject of the next lesson.

Let's recap.

Reasons for a business to build a new database include:

They cannot change their existing database to suit their new needs.
The business needs to construct a new application that cannot connect to an existing database.
The current database is at capacity and cannot take on additional roles without upgrading hardware, which
can be expensive.

Reasons to avoid building a new database include:

The existing database can be adapted by adding a table or field to satisfy the need.
A simpler or cheaper tool can satisfy the business need.
A third-party business can provide a service to fill the need.
The business needs to access the existing data in a new way rather than to store new data.

 SUMMARY

In this lesson, you learned that businesses have many reasons to store customer data in a person's
daily interactions. There are many different scenarios in which a business may or may not want to use
a database. You also saw some of the considerations for using a database in a business. You also got to
see how many different kinds of databases you or others interact with every day as you go about your
normal lives. These insights will help you determine whether to build or not to build the database. Next
time, you will learn about the differences between a database and another means of storing data, the
flat file.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 10
Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 11
Databases vs. Flat Files
by Sophia

 WHAT'S COVERED

In this lesson, you will explore the differences in flat files and databases regarding flexibility, data
consistency, security, and data redundancy, in three parts. Specifically, this lesson will cover:
1. Databases
2. Flat File History
3. Weaknesses of Flat Files

1. Databases
Databases have evolved from flat files to relational databases over time to meet the needs of more complex
applications.

Flat files are simple databases that store data in a single table. A more recent example of a flat file is a
spreadsheet. Flat files that stored data in plain text in a table format were originally developed by IBM in the
1970s. There were no indexes, keys, or relations between the information between rows, but each row was its
own dataset. For example, in a flat file that stored customer orders, each order would be in a separate row. The
fields (such as Name, Address, City, and so on) would be separated by a delimiter character, a sequence of one
or more characters for specifying the boundary between separate, independent regions in plain text,
mathematical expressions, or other data streams, such as a comma or a semicolon. As the name implies, a
delimiter character defines the boundary of a field.

A flat file can be efficient for small datasets, but it becomes difficult to manage and query as the data grows.

 DID YOU KNOW

Flat files also do not support relationships between different pieces of data, which can limit their usefulness
for complex applications.
A flat file database can potentially contain a lot of duplicated information, and back in the 1970s, data storage
space was limited. All that duplicated information took up valuable processing time and storage space, so a
new way needed to be developed that would not just be more robust for searching but also economize data
storage space.

Spreadsheet applications such as Visicalc (1979) and Lotus 1-2-3 (1983) were developed to more easily manage
data in structured rows and columns, and such applications are still very popular today, such as Microsoft Excel.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 12
Although spreadsheets make the data much easier to view and query, they do not solve the fundamental
problems of redundancy and connectivity. A better way to manage complex, related datasets was needed.

 BIG IDEA

Relational databases are a more advanced type of database that stores data in multiple tables. Each table
has a unique set of fields (columns), and the tables are linked together by common values in certain
columns. This allows for more complex queries and relationships between data, making relational
databases more suitable for large and complex applications.
The evolution from flat files to relational databases was driven by the need for more efficient and scalable data
storage and retrieval. A British computer scientist named Edgar Codd wrote a paper in 1970 that introduced the
fundamental concepts of what we now know today as the relational database model, and shortly after that, IBM
developed the SQL language as a means of interacting with such a database. Oracle was one of the most
influential early commercial relational database products. Introduced in 1979, it helped popularize the relational
database model. The relational database solved processing and storage issues, allowing applications to
become more complex. This solved many of the limitations of flat files that became more apparent over time
and as data use and needs became more complex. Relational databases offered a solution to these limitations,
and they have become the standard for database management in most organizations.

Flat files are still popular today, mostly via spreadsheet apps such as Microsoft Excel and Google Sheets. Often
you will see flat and relational databases used in conjunction with each other, as each has its own strengths and
weaknesses depending on the business problem that needs to be solved. Since 1970, both types of databases
have grown, become common, and are widely used today, but there are core benefits that relational databases
have over flat file databases.

 KEY CONCEPT

Relational databases are more:


Efficient at storing and retrieving data than flat files because they use a more compact data
representation, and they can take advantage of indexes to speed up queries.
Scalable than flat files because they can be easily partitioned to store large amounts of data on multiple
servers.
Flexible than flat files because they can support complex relationships between data, which makes
them more suitable for a wider range of applications.
Efficient at storing data than flat file databases; relations between tables allow for data to be stored
once but used in any number of complex relationships with other data.

Relational databases have become the standard for database management in most organizations. They are
used by a wide variety of applications, including enterprise resource planning (ERP) systems, customer
relationship management (CRM) systems, and data warehouses.

Flat file databases are still used and have certain advantages over relational databases in some situations. For
example, NoSQL flat file databases can store data in a simple, unstructured, or semi-structured format, typically
in the form of key-value pairs. MongoDB is a popular example of a NoSQL flat file database that stores data in
an object-oriented way.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 13
Over time, both relational and flat file databases have shown that they can be used in conjunction with each
other; it is really just a matter of what is the most efficient way of storing and retrieving data. A NoSQL database
is a good choice where rapid read and write operations are important or when storing unstructured or semi-
structured data. However, relational databases remain the most popular type of database for general purpose
use.

 TERMS TO KNOW

Relational Database
A database that has multiple tables, with shared common columns between tables.

Flat File
A data file that is not related to or does not contain any linkages to another file.

Spreadsheet
A row-and-column grid in which structured data tables can be created.

Delimiter
A sequence of one or more characters for specifying the boundary between separate, independent
regions in plain text, mathematical expressions, or other data streams.

MongoDB
A popular example of a NoSQL flat file database that stores data in an object-oriented way.

2. Flat File History


For thousands of years, people kept track of data on clay tablets, papyrus, vellum, and then finally, paper.
Keeping track of information by identity and then a checkmark is nearly as old as civilization itself. Physical file
systems are where the idea of databases began; as computer systems were being designed in the 1950s and
1960s, people began trying to work out the best way to store and process data.

A mathematician named John Backus, working for IBM in 1959, proposed the creation of a specialized database
and programming language that would help people find data better. Before this, the “ledger,” or book, was the
way to do accounting, taxes, census, and other information. All of this was collected and handled with paper
and pencil.

From a computer viewpoint, the ledger style of recordkeeping was very hard to process and took up more hard
drive space than was available. These types of paper systems were organized through a system of file folders
and filing cabinets. Typically, there wasn’t a significant amount of data collection required, and reporting was
quite limited. This type of system worked well as a data repository. However, as companies became larger and
needed to report on their data, managing the data in these physical file systems became too difficult and too
costly.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 14
Generating reports from physical file systems was difficult and error prone.

 REFLECT

Can you imagine being in an aircraft hangar-sized warehouse like the picture above?
Because people were used to the physical processing of data, files in early computerized file systems were
stored similarly to physical files. There were “boxes” of “cards,” with each one of those “boxes” being a flat file
and each “card” being a row in the flat file database, much like in spreadsheets today. If a business user
needed information, they turned to data retrieval specialists who handled all of the data processing requests for
the company. These early computerized flat file systems became just as much of a burden to manage due to
related and overlapping data, with no means of controlling or managing the data across each of the files.

The shortcomings of the physical recordkeeping and the difficulty in duplicating it on early computerized file
systems were the impetus for developing the first computerized database systems. Those first database
systems took time to develop. It was not until the 1970s when more advanced programs were created that
allowed the use of the new math. Even then it was still difficult to create databases. The early databases, both
relational and flat, had challenges in returning quick answers to questions and providing answers that humans
could make sense of. Those old systems also had very complex systems administration and lacked security.

3. Weaknesses of Flat Files

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 15
You are just learning how to use a database, so let's set up a link for you to meet your first database in this
class.

 TRY IT

Click on this link to launch your first database:

You’ll learn a lot of commands in this class, but for right now, you can just follow along and learn to move
around a database. You will learn a lot more about these commands as you progress through the class.

Let’s look at two tables, the invoice table and the customer table. To get all the information in each table, enter
this one table at a time. Type the text shown below and then click the Run button (the right pointing triangle) in
the upper right corner of the screen.

Select * from invoice;


This will show you all the records in the invoice table. Then repeat the process using the following command:

Select * from customer;


This will show you all the records in the customer table.

Notice what these two tables have in common: They both have a customer_id field. In this database, invoices
are stored in the invoice table and related to the customer through the customer_id. If you had to use a flat file
database, you could only have a single table in which to store both the customer and invoice data. All of the
information would be stored in a single record. The customer’s information, like the name, address, phone, and
email, would be stored in the same row as the invoice information. If the customer makes more than one
purchase, all of that customer information would have to be reentered for each item they order.

 REFLECT

Even today, that would waste storage space and cost in processing power trying to process a huge, ever-
growing file. Could you imagine Amazon or eBay working from one table only for all their customers and
third-party sellers? Eventually, the table would become too big to process, so having a shared column
between two tables makes processing and storage much easier and quicker to work with.

Imagine if you had to enter all the address and user information every time you made an online purchase
from a company. So, rather than storing your address, or your family's address, you had to enter that every
time. If you ran a query to pull up a list of your orders, you would get a lot of records back if you ordered
from that company a lot. That might make it very hard to find out what your current order is and if it has
been processed. The whole order process would become more difficult each time you placed an order.
What other problems can you see as the invoice table keeps on getting bigger and bigger, or the store
becomes very popular?
Keeping customer data in flat files creates a lot of data redundancy, as a company would have the same data
stored multiple times, not only in the same flat file but perhaps in other flat files for other purposes.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 16
If you have data redundancy due to repeated data, then you have poor data security. Having multiple copies of
the same data increases the chance of unauthorized access to the data. The operating system would have to
handle the security for flat files rather than the database. This could open the database or the operating system
to misuse if someone gets the wrong permissions for the system.

Potential data inconsistency would also exist, like the address scenario above. You could have different and
conflicting versions of data appearing in various locations in your records. Data inconsistency can further
increase if you have data entry issues that go undetected.

Finally, since you would have repeated information, the data file would become much larger and more difficult
to manage and maintain. You would also lack the flexibility to manage the data in a flat file, as you could only
enter data that fits the data structure. In the example, you would only have one customer and one invoice
inserted together in the same flat file table. If you later wanted to record other information about customer
transactions, you might be unable to change the available columns to include those options.

 SUMMARY

In this lesson, you learned that the overall style and structure of relational databases have evolved
from physical file systems and computerized flat file systems. Compared to those earlier methods of
storing data, relational databases offer improved flexibility, data consistency, security, and reduced data
redundancy. Finally, you learned about key weaknesses of flat files. Next time, you will learn more
about various parts of a database system.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Delimiter
A sequence of one or more characters for specifying the boundary between separate, independent
regions in plain text, mathematical expressions, or other data streams.

Flat File
A data file that is not related to or does not contain any linkages to another file.

MongoDB
A popular example of a NoSQL flat file database that stores data in an object-oriented way.

Relational Database
A database that has many tables, with shared common columns between tables.

Spreadsheet

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 17
A row-and-column grid in which structured data tables can be created.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 18
Parts of a Database System
by Sophia

 WHAT'S COVERED

In this lesson, you will explore modern database systems that have many pieces working together to
support business needs. This lesson explores the five main parts of a database system—hardware,
software, people, procedures, and data—in two parts. Specifically, this lesson will cover:
1. Main Parts of a Database
2. Components of Database Systems
2a. Hardware
2b. Software
2c. People
2d. Procedures
2e. Data

1. Main Parts of a Database


As you learned in the prior lesson, the issues with flat file systems make the use of a relational database much
more effective and efficient. Where flat file systems contain separate and unrelated files, a relational database
stores logically related data in separate tables within the same system. This structure helps organize how data
is stored, accessed, and managed, all in one place. A database system has five main parts: hardware, software,
people, procedures, and data.

2. Components of Database Systems


2a. Hardware
Hardware refers primarily to the CPU, the RAM, and the disk drives on the physical or virtual system on which
the database runs. In some designs, hardware can encompass all the physical devices that the database system
interacts with. This can include the servers that the database runs on as well as connected systems like
desktops, workstations, tablets, and mobile devices that connect to the database. Hardware can also include
the network components, data storage devices, and other devices that may be unique to a database system,
such as a digital key reader or automated teller machine.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 19
 TERM TO KNOW

Hardware
The physical components such as the CPU, RAM, and storage devices on which a database runs.

2b. Software
The software in the database system can consist of the programs that organize the database system and
enable user interaction. The operating system software helps to manage the hardware components and allows
the other key software to run on the servers and computers. Some common operating systems include
Windows, Linux, macOS, UNIX, Android, and iOS. The database management system (DBMS) software
manages the database within the entire database system. Some common DBMS software includes PostgreSQL,
Oracle, MySQL, and MS SQL Server.

The other types of software in a database system are the applications or utilities used to access and manipulate
data. These are the programs that are used to interact with the DBMS to insert data, query data, and create
reports. Various tools and utilities are also used to help create the database structures and control access. One
such utility is the web interface that you use in this course to access the PostgreSQL database to practice SQL
commands.

 TERMS TO KNOW

Software
The operating system and the applications that manage the database and enable people to interact
with it.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 20
Database Management System (DBMS)
Software that manages the database within the entire database system.

2c. People
The people include the various types of users in the database system. There are generally five different types
of users that are defined based on job functions within the database.

System administrators oversee the entire database/information system to ensure that everything is operating
optimally.

Database designers design the database structure and architect the database to ensure that the database
design not only fits the business needs but also functions optimally.

Database administrators, or DBAs, are the users that manage the database management system and ensure
that the database is running correctly. DBAs control access and optimize the queries.

System analysts and programmers design and implement the application programs that interact with the
database management systems. They create the web applications, applications, and reporting systems that end
users use to interact with the data.

Lastly, you have end users, who are the individuals that use the applications to run the day-to-day operations of
the organization. Different end users will interact with different levels of the information from the database.

 DID YOU KNOW

Sales, marketing, customer service, directors, or even the CEO of a company would be considered an end
user of a database system.

 TERMS TO KNOW

System Administrator
A person who oversees the entire database/information system to ensure that everything is operating
optimally.

Database Designer
A database architect of the database who ensures that it fits the business needs and functions
optimally.

Database Administrator (DBA)


A user of the database management system who ensures that the database is running correctly.

System Analyst
Designs and implements the application programs that interact with the database management
systems.

End User
The user of the applications to run the day-to-day operations of the organization.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 21
2d. Procedures
Procedures are the rules or instructions that define how the database is designed and how it is used. They are
a key aspect of the database design process, as they ensure that the specific way that a business operates is
correctly reflected in the database design and setup. These procedures can also include various methods to
monitor and audit the data based on a business process or laws and can include what data to capture, how to
back it up, where it is stored, access, and security around the data. Some of these procedures will be items
required by law, regulation, or industry requirements. Others will be designed for specific one-off purposes that
are unique to a particular event or process idea.

 TERM TO KNOW

Procedures
Rules or policies about how the database is designed and used.

2e. Data
Lastly, you have data, comprising the raw facts that are used to build information. You learned about data and
information in the previous lesson.

The entire database system has to fit within an organization’s operational structure. Depending on the business
and the size and complexity of the organization, there may be varying amounts of each part of the database
system. Data also has its own format that the database designer can work with. An e-commerce order will have
a specific format returned from the order system, while a web log or firewall log will have a different structured
format. The inbuilt formats for some data types can simplify the process of designing, optimizing, and querying
tables that include them.

 TERM TO KNOW

Data
Known as “raw facts,” or the data we collect, like phone numbers or addresses.

 SUMMARY

In this lesson, you learned that there are five main parts of a database system. The components of a
database include hardware, software, people, procedures, and data. Hardware covers the physical
objects used. Software involves the programs for interacting with the data, or raw facts used to build
information, by performing certain procedures that define how the database is used. You also learned
that a variety of people in different roles are also part of a database management system, including
system administrators, database administrators, database designers, programmers, and end users.
Finally, you learned that all of these people work together to create and maintain the database system
of a company or organization. Next time, you will examine the differences between relational databases
and non-relational databases.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 22
Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Data
Known as “raw facts,” the data we collect, like phone numbers or addresses.

Database Administrator (DBA)


A user of the database management system who ensures that the database is running correctly.

Database Designer
A database architect of the database who ensures that it fits the business needs and functions
optimally.

End User
The user of the applications to run the day-to-day operations of the organization.

Hardware
CPU, RAM, and disk drive speed all have an influence on how well a database runs on a computer
system.

People
Database administrators, database designers and users are the people who manage, design, and
use databases. Each type of person has specific needs and responsibilities when working with a
database.

Procedures
Rules on how a database is used or designed. Think of this as similar to how you would design a
business process, but here, how and what business data is stored, how it is used, who accesses it,
and how it can be accessed.

Software
The operating system and RDBMS (Relational Database Management System) that influence
performance and management/administration of a database.

System Administrator
A person who oversees the entire database/information system to ensure that everything is
operating optimally.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 23
Non-Relational Databases
by Sophia

 WHAT'S COVERED

In this lesson, you will explore the similarities and differences between relational databases and non-
relational databases like NoSQL. Specifically, this lesson will cover:
1. Differentiating Between Relational and Non-Relational Databases
2. Non-Relational Databases
2a. When to Use a Non-Relational Database
2b. Main Types of Non-Relational Databases

1. Differentiating Between Relational and Non-


Relational Databases
In previous lessons, we have talked about the design tradeoffs when designing a database. Depending on what
you are building, a relational database might not be the right answer for everything. Non-relational databases
are databases that do not use the traditional tabular relational model that is used in relational database
management systems. You learned about NoSQL, one type of non-relational database, in the previous lesson.

Many companies use a blend of relational databases and non-relational databases to solve business problems.
For example, Amazon uses NoSQL databases to deliver information on over a billion products, and when they
were building their latest site, they needed a database that could deliver 10,000 pages a second and be easily
scalable in capacity. A NoSQL database was their chosen solution for their product pages. For customer data,
customer orders, and other information, a relational database is what they chose for that side of their business
process flow.

 REFLECT

Are the terms "non-relational" and "NoSQL" synonymous?

NoSQL databases are a subset of non-relational databases that have gained prominence for their ability to
address specific challenges related to scalability, flexibility, and performance, especially in modern web
applications and big data scenarios. Non-relational databases include a broader range of database
systems, some of which may not be associated with the characteristics and design principles of NoSQL
databases.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 24
Among the key differences between relational and non-relational databases are:

Tables provide the structure of relational databases, where data is organized into columns and rows. Each
table has a fixed schema; relationships between tables are established with primary and foreign keys.
While non-relational databases use a variety of data models such as key-value, document, columnar, or
graph, non-relational databases don't. NoSQL can handle unstructured or semi-structured data because
they provide more flexibility in data structures.
In relational databases, scaling is typically achieved by upgrading hardware (e.g., adding more memory or
processing power) to handle increased loads. In contrast, non-relational databases are designed to scale
horizontally. Data can be distributed horizontally across multiple servers or clusters, allowing for seamless
handling of large-scale data and high traffic loads.
Relational databases have a rigid schema requiring predefined tables and fields. A schema change can be
challenging and involve downtime or complex migration procedures. A non-relational database, on the
other hand, offers schema flexibility. Schema changes can be made on the fly without disrupting the system
significantly.
Relational databases use SQL to retrieve and manipulate data. With SQL, you can write powerful queries
that perform complex joins, aggregations, and transactions. The query languages used by non-relational
databases are often specific to their respective data models. While less expressive than SQL, these query
languages are designed to efficiently handle a database's particular data structure and characteristics.
ACID refers to four database characteristics that ensure database integrity by controlling data changes:
Atomicity, Consistency, Isolation, and Durability. You'll learn about them in Unit 5 of this course. Relational
databases apply ACID principles strictly, which has a side effect of creating extra work that can affect
performance and scalability. Some non-relational databases sacrifice strict ACID properties to achieve
performance and scalability gains.
A relational database is typically used for applications that require structured data with well-defined
relationships, complex querying, and transactional integrity. Suitable applications include e-commerce
platforms, financial systems, and inventory management. Unlike SQL databases, non-relational databases
can handle large volumes of rapidly changing, unstructured, or semi-structured data. These systems are
often used in real-time analytics, content management systems, social media platforms, and IoT data
storage platforms.

Implementations of relational and non-relational databases can differ based on their purposes. There are many
factors to consider when deciding between the two, notably the application requirements, the scalability needs,
the data characteristics, and the expected workload.

This course focuses on relational databases, where data is organized into tables in a structured manner. There
are relationships and dependencies between the tables. You use SQL to query the data in a specific format
through the tables, columns, and rows. Each table contains data pertaining to a certain subject, such as
Inventory or Customers. The columns represent the fields, which are individual pieces of data to be stored, such
as Name or City. The rows represent the records, which are the instances being described, such as a specific
inventory item or customer. This type of relational database structure makes it easy to sort, filter, and compute
various calculations with expressive query languages. But you might run into companies using both databases,
so it is good to be aware of non-relational databases.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 25
One of the most common examples of a non-relational database is a NoSQL database, which uses an approach
to database design that enables the storage and querying of data outside the traditional structures found in
relational databases. NoSQL databases have a flexible data model, which makes it easy to store and combine
data of any structure and allows you to modify the schema without any downtime dynamically. In contrast,
making changes to the data model in a relational database can take time and planning to implement. Since the
NoSQL databases are built on performance and scalability, they are set up to quickly scale out, offering
unlimited growth compared to relational databases. They are also designed for highly available systems to
provide a consistent, high-quality experience for all users worldwide.

Relational databases, on the other hand, use expressive query languages to access and modify data. They also
use indexes within the database to provide efficient access to data rather than maintained in the application
code for non-relational databases. With the relational models, there’s much stronger consistency through the
primary and foreign keys, such that applications should be able to read what has been written to the database
immediately.

 TERMS TO KNOW

Schema
The structure of a database, including its tables and the relationships between them.

Non-Relational Databases
Databases that do not use the traditional tabular relational model that is used in relational database
management systems.

2. Non-Relational Databases
Non-relational databases are databases that do not use the traditional tabular relational model that is used in
relational database management systems. In this section, you will learn about the main types on non-relational
databases and when to use them.

2a. When to Use a Non-Relational Database


If you're working with data that isn’t clearly defined and organized, and you can't establish defined tables and
relationships in the dataset, you might want to consider a non-relational database. The information gathered in
a non-relational database doesn't need to be segmented into tables and defined relationships.

EXAMPLE A messaging application like Facebook Messenger is an excellent example of the need for a
non-relational database. With so much unstructured information that gets included within a message, the
messages must be stored in a non-relational database for efficiency.

2b. Main Types of Non-Relational Databases


There are four main types of non-relational databases: document-oriented, key-value stores, wide-column
stores, and graph stores.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 26
Document-oriented databases focus on pairing a key with a complex data structure. It is similar to storing
information in a Word document and just adding more information into that document. You must search through
that document if you need to find the information you are looking for.

Key-value stores are databases that use different keys, where each key is only associated with one value. An
example is a dictionary where the word is the key, and the value is the word's definition. This is one of the
simplest types of non-relational databases. The Registry in Microsoft Windows stores system settings in this
type of database.

Wide-column stores use tables, rows, and columns similar to a relational database, but the names and format of
the columns can vary from row to row in the same table.

Graph stores use graph structures to link data through edges, nodes, and properties. It can be one of the more
complex types of non-relational databases.

 TERMS TO KNOW

Document-Oriented Database
A non-relational database that pairs a key with a complex data structure.

Key-Value Store
A simple non-relational database that pairs each key with a single value.

Wide-Column Store
A non-relational database that allows the names and format of columns of its tables to vary from row to
row.

Graph Store
A non-relational database that links data through edges, nodes, and properties.

 SUMMARY

In this lesson, you learned that there are different uses for non-relational and relational databases
depending on the organizational need. Organizations need to decide which database design is needed
for their specific purpose. Relational databases use structured data tables and a formal query language
to offer consistency and efficiency, while non-relational databases like NoSQL have unstructured data
storage that provides better flexibility and scalability than relational databases can usually achieve. You
learned when to use a non-relational database and explored the four main types of non-relational
databases: document-oriented, key-value, wide-column store, and graph store.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 27
Document-Oriented Database
A non-relational database that pairs a key with a complex data structure.

Graph Store
A non-relational database that links data through edges, nodes, and properties.

Key-Value Store
A simple non-relational database that pairs each key with a single value.

NoSQL Database
A database that can work with semi-structured or unstructured data by enabling the storing and
querying of data outside the traditional structures found in relational databases.

Non-Relational Databases
Databases that do not use the traditional tabular relational model that is used in relational database
management systems.

Schema
The structure of a database, including its tables and the relationships between them.

Wide-Column Store
A non-relational database that allows the names and format of columns of its tables to vary from
row to row.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 28
Transactional and Analytical Databases
by Sophia

 WHAT'S COVERED

In this lesson, you will explore the similarities and differences between analytical databases and
transactional databases. Specifically, this lesson will cover.
1. Introduction
2. Transactional Databases
3. Analytical Databases

1. Introduction
An organization's data management and decision making depend heavily on understanding the difference
between transactional databases and analytical databases. Everyday operations require transactional
databases, such as handling financial transactions, managing inventory, and dealing with customers. These
applications provide real-time transaction processing, ensure data integrity, and enforce business rules.
Transactional databases ensure operational data accuracy, consistency, and reliability, with streamlined
operations and satisfied customers as a result.

Large volumes of data can be transformed into meaningful insights using analytical databases. They enable
businesses to identify trends, patterns, and correlations in their data through complex data analysis, reporting,
and advanced analytics tasks. In order to make better decisions, organizations rely on analytical databases that
provide insight into their business performance, customer behavior, and market trends. Organizations can
harness data-driven insights through analytical databases to optimize operations, improve decision making, and
gain a competitive advantage.

 TERMS TO KNOW

Transactional Database
A database system designed for managing the day-to-day operations of an organization.

Analytical Databases
A specialized type of database designed for storing, retrieving, and analyzing large volumes of data.

2. Transactional Databases

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 29
Transactional databases, also called online transaction processing (OLTP) databases or operational databases,
are used for handling real-time transactions daily. The system is designed to efficiently process many individual
transactions, such as creating, retrieving, modifying, and deleting data.

Transactional databases are traditionally relational ones that use tables, rows, and columns as their basic
building blocks. Schemas define the structure of the tables, the data types, and the relationships between them.
A structured approach facilitates the organization of data, the indexing of data, and the processing of queries.

Because of the many frequent changes made in a transactional database, software and processes must be in
place to ensure that each change is successfully made, even when the unexpected happens. For example,
what if the power goes out as a change is being written to the database?

Transactional databases maintain data integrity and reliability by adhering to the properties of ACID (Atomicity,
Consistency, Isolation, Durability). Using ACID properties ensures that transactions are treated as indivisible
units, guaranteeing the successful completion of all transaction operations. These measures ensure data
consistency and prevent data loss or corruption.

Transactional databases enable multiple users to simultaneously access the data, and that can potentially
create situations where multiple sources or users are updating the same record (differently) at the same time. To
avoid such problems, transactional databases implement data integrity mechanisms such as:

Locks: Mechanisms used to prevent multiple transactions or processes from simultaneously accessing or
modifying the same data.
Concurrency control: Processes for enabling and releasing locks and resolving conflicts when multiple
transactions contend for the same data.
Isolation levels: Rules about the degree to which concurrently running transactions are isolated from each
other.

Because transactional databases tend to be used in real-time activities such as sales and banking, they are
optimized for fast response times and low latency. These systems are designed to handle high transaction
volumes and prioritize quick data retrieval and updating.

 KEY CONCEPT

An important feature of transactional databases is transaction management. Transactions are initiated,


committed (that is, written to the database), rolled back, and recovered using well-defined procedures
designed to ensure data integrity if a failure, error, or system crash occurs.
A transactional database is well suited for OLTP workloads, which typically involve frequent and short-duration
transactions. A database like this is ideal for storing, updating, and frequently querying data.

Many applications and industries use transactional databases, including e-commerce, banking, inventory
management, airline reservation systems, and healthcare. Data consistency, integrity, and responsiveness are
based on this kind of database, ensuring efficient and reliable transaction processing.

 TERMS TO KNOW

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 30
Online Transaction Processing (OLTP) Databases
Operational databases, which are used for handling real-time transactions daily.

ACID
An acronym that stands for atomicity, consistency, isolation and durability of data.

3. Analytical Databases
The purpose of analytical databases is to store and analyze large amounts of data. They are optimized to
respond to queries quickly and perform advanced analytics. Analytical databases have several key
characteristics that help define the database as analytical rather than transactional.

Analytical databases typically store data in columns rather than in rows; this is called columnar data storage.
Each column of a table is stored separately, with all the values for that column grouped together. This
arrangement enables queries that scan large data pools for specific columns to run faster and more efficiently.

Query performance is further improved by in-memory processing, which stores and manipulates data entirely in
RAM rather than on traditional disk storage. This enables much faster query execution.

Performance can be further enhanced by parallel processing in analytical databases. Parallel processing refers
to the simultaneous execution of multiple operations across multiple CPU cores in a distributed computing
environment to improve performance. This is especially useful for large-scale analytics, where a single server
cannot handle the data.

Analytical databases are designed to handle complex queries efficiently, including those that involve joining
tables, summarizing and grouping data, and embedding queries within other queries. These capabilities help
users extract useful information from the data without long delays.

The ability to scale an analytical database is essential if it must be able to handle large amounts of data and
complex queries. Distributed architectures are typically used for this purpose. A distributed architecture is a
design in which data is stored across multiple interconnected servers, to avoid bogging down any individual
server.

Databases with analytical capabilities can compress data, save space, and improve their performance over time.
Some of the techniques used to achieve these benefits include:

Data deduplication: eliminating duplicate values.


Data pruning: eliminating outdated or irrelevant data.
Data compression: running data through a compression algorithm that reduces the space it occupies.
Indexing: creating a list of values and references to the locations where they can be found.

 KEY CONCEPT

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 31
Analytical databases are invaluable to businesses whose data must be stored and analyzed in large
quantities. Companies can benefit from them by making better decisions, improving efficiency, and
identifying new opportunities. This kind of database supports business analytics, decision support systems,
and group decision support systems.
Analytical databases generally consist of two main components: the data warehouse and the online analytical
processing (OLAP) front end. The data warehouse is a specialized database that stores the underlying data in a
format optimized for decision support rather than transactions. As such, data is typically loaded from
transactional databases or other sources mostly with SELECT statements in SQL. It’s rare to see any UPDATE,
INSERT, and DELETE statements, as the data has already been verified and validated when it has gone into the
transactional database. Online analytical processing (OLAP) generally contains a set of tools used to process
that data from the data warehouse so that it can be used for analytics. This includes formatting, clearing, and
removing data that does not fit specific patterns for analysis.

Analytical databases are used to help with business intelligence by generating information that is used for
business decision making. The volume of data in these analytical databases can be massive due to the
historical data and the data sets that may be added from third parties. Since the data does not often change,
space is not as much of a concern.

 TERMS TO KNOW

Columnar Data Storage


Storing data in columns rather than in rows.

Parallel Processing
The simultaneous execution of multiple operations across multiple CPU cores in a distributed
computing environment to improve performance.

Online Analytical Processing (OLAP)


Software technology that is used to analyze business data from different points of view.

Business Analytics
The practice of using data analysis and statistical techniques to drive informed business decision
making.

Data Warehouse
A backend enterprise-level system used for storing data that is used for analysis and reporting.

Front End
A user interface or application that enables users to interact with a database without directly interacting
with the underlying DBMS.

Back End
The part of the database system responsible for managing and storing data, including the DBMS and
the physical storage devices. The back end is not directly accessible to end users.

 SUMMARY

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 32
In this lesson, you learned about the different purposes and characteristics that distinguish
transactional databases from analytical databases. An OLTP (online transaction processing) database
handles and manages daily operational transactions in real time. Data integrity, concurrency, and high-
speed transactions are prioritized, ensuring efficient data entry, retrieval, and modification. It's important
to note that transactional databases are optimized for handling multiple concurrent transactions with
ACID properties (Atomicity, Consistency, Isolation, Durability), ensuring reliable and consistent data.

You also learned that a database that enables complex analytical queries and data analysis is known as
an analytical database, also called an OLAP (online analytical processing) database. These systems are
designed to handle large volumes of data and provide fast query response times. Businesses can gain
insights and make informed decisions through data exploration, reporting, and complex calculations.
An analytical database often involves advanced analytics techniques, facilitating in-depth analysis and
understanding of data trends.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

ACID
An acronym that stands for atomicity, consistency, isolation and durability of data.

Analytical Databases
A specialized type of database designed for storing, retrieving, and analyzing large volumes of
data.

Back End
The part of the database system responsible for managing and storing data, including the DBMS
and the physical storage devices. The back end is not directly accessible to end users.

Business Analytics
The practice of using data analysis and statistical techniques to drive informed business decision
making.

Columnar Data Storage


Storing data in columns rather than in rows.

Data Warehouse
A backend enterprise-level system used for storing data that is used for analysis and reporting.

Front End

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 33
A user interface or application that enables users to interact with a database without directly
interacting with the underlying DBMS.

Online Analytical Processing (OLAP)


Software technology that is used to analyze business data from different points of view.

Online Transaction Processing (OLTP) Databases


Operational databases, which are used for handling real-time transactions daily.

Parallel Processing
The simultaneous execution of multiple operations across multiple CPU cores in a distributed
computing environment to improve performance.

Transactional Database
A database system designed for managing the day-to-day operations of an organization.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 34
Data Model Innovations
by Sophia

 WHAT'S COVERED

In this lesson, you will explore the major innovations and data model evolutions from the 1960s to the
2020s. Specifically, this lesson will cover:
1. Introduction
1a. Databases of the 1960s
2. Hierarchical Data Models of the 1970s
3. Relational Models of the 1970s
4. Object-Oriented in the 1980s
5. XML Hybrids in the 1990s
6. NoSQL in the 2000s and Beyond

1. Introduction
Databases have evolved significantly over the past 60 years in technology, scale, and capabilities. When
databases were first developed in the 1960s, they were based on hierarchical and network models, where data
was arranged as a tree or interconnected network. Its main use was to store and retrieve structured data, but it
needed more scalability and flexibility. The relational database model was introduced in the 1970s,
revolutionizing the field. During the development of relational databases, SQL (Structured Query Language)
became the standard language for querying and manipulating data. Over the course of several decades,
relational databases grew rapidly to become the dominant database model.

The database landscape has undergone further transformations in recent years. Cloud computing, big data, and
distributed systems have led to new database technologies. As unstructured and semi-structured data volumes
increase, NoSQL (non-relational) databases, such as key-value stores, document stores, columnar databases,
and graph databases, have become increasingly popular. As an evolution of traditional relational databases,
NewSQL databases offer superior scalability and performance while maintaining ACID compliance. As
specialized databases, such as time-series, graph, and in-memory databases, have proliferated, organizations
can leverage highly optimized data management and analysis solutions. As cloud-based databases and
database-as-a-service (DBaaS) offerings have evolved, deployment, management, and scalability have
simplified.

As we saw with the shift from a manual file system to a computerized file system, there is always a focus on
finding better ways to manage data. There have been many changes in computerized file systems, with each

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 35
model trying to fix some of the shortcomings of the previous model. You will find that many of the newer
database concepts have a significant resemblance to some of the older data models and concepts.

1a. Databases of the 1960s


In the 1960s, databases were in their infancy, and data management was still emerging. Most databases at that
time followed a hierarchical model or network model. Hierarchical databases organize data in a tree-like
structure, with parent-child relationships between records. This model was commonly used in early mainframe
systems and allowed efficient data retrieval. Network databases, on the other hand, employ a more complex
interconnected structure, where data was linked through pointers. This model provided more flexibility in
representing relationships but took more work to manage. Databases in the 1960s were typically implemented
on large-scale mainframe computers and used for managing structured data, such as financial records and
inventory management.

Storage capacity and processing power were limited compared to today's standards, influencing database
design and functionality in the 1960s. During this era, data was stored primarily on magnetic tapes and disks.
Accessing and manipulating data required specific programming languages and interfaces. Common Business-
Oriented Language (COBOL) was a popular programming language for data processing, and systems were
designed to handle batch processing rather than real-time transactions.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 36
 TERMS TO KNOW

Hierarchical Models
Databases that store data as records and organize them into a tree-like structure where one parent
node has many child nodes connected to it through links.

Network Model
The database in the network model represents objects and their relationships in a flexible manner. The
schema differs from other hierarchical or lattice models because it is viewed as a graph with nodes and
arcs representing objects and relationships.

Common Business-Oriented Language (COBOL)


A popular programming language for data processing; systems were designed to handle batch
processing rather than real-time transactions.

2. Hierarchical Data Models of the 1970s


In the 1970s, hierarchical databases continued to be a popular method of managing and organizing data.
Hierarchical databases arrange data in a tree-like structure based on a hierarchical structure, in which parent
and child relationships are formed. Hierarchical databases have a primary key for each record, and hierarchical
links represent relationships between records. A parent record can have more than one child, but a child record
can have only one parent. Data with clear hierarchical relationships, such as files or organizational structures,
can be represented using this model.

A hierarchical database provides efficient retrieval and navigation of data since child records can be accessed
only through the parent record. However, it is less suitable for representing more diverse and dynamic data
structures because the hierarchical database model is not flexible when dealing with complex or changing
relationships. In spite of this, the hierarchical model played a significant role in the early stages of database
development, laying the foundation for subsequent database models.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 37
3. Relational Models of the 1970s
A British computer scientist named Edgar Codd is said to be the father of relational databases. Codd's 1970
paper, "A Relational Model of Data for Large Shared Data Banks," established the relational model. Codd
introduced an innovative way to manage and manipulate data by challenging hierarchical and network
database models.

In a relational database, data is stored in tables consisting of rows and columns, with each table representing a
relation in the relational model. Some of the key structural principles of relational databases include simplicity,
independence from data, and mathematical foundations. A relational database is manipulated using formal
mathematical tools called relational algebra and relational calculus.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 38
Relational models have significant advantages, such as data independence, flexibility in retrieval, and the ability
to establish relationships between tables using primary and foreign keys. In addition to facilitating efficient
querying and analysis, they provide a logical and conceptual framework for managing data.

Codd's work led to the creation of Structured Query Language (SQL), an interface for working with relational
databases. SQL became the de facto language for querying and manipulating relational databases. Having a
common language made it easier to retrieve and manipulate data consistently and intuitively across different
relational database systems.

Data management was transformed by the relational model, which is the foundation of most modern databases.
The system paved the way for commercial relational database management systems (RDBMS) such as Oracle,
IBM DB2, and Microsoft SQL Server. The use of these systems has become widespread in a variety of
industries.

4. Object-Oriented in the 1980s


In the mid-1980s, there was a growing interest in combining object-oriented programming principles with the
relational model of databases, leading to the emergence of object-relational and object-oriented databases.

 KEY CONCEPT

Object-oriented programming (OOP) is a way of organizing computer programs around objects, which are
self-contained units that represent real-world entities or concepts. These objects can have data (attributes)
and behaviors (methods).
Object-relational databases (ORDBMS) bridge the gap between traditional relational databases and the object-
oriented paradigm. They extend the relational model by supporting complex data types such as arrays, nested
tables, and user-defined types. This allows for more flexibility in storing and querying structured and semi-
structured data. ORDBMS also incorporate object-oriented concepts like inheritance and encapsulation,
enabling the modeling of real-world entities as objects with associated attributes and behaviors.

In parallel, object-oriented databases (OODBMS) comprise a database management system (DBMS) that
supports the modeling and creation of data as objects. OODBMS focus on directly integrating object-oriented
programming principles into database management systems. OODBMS store data as objects and offer
encapsulation, inheritance, and polymorphism mechanisms. They provide a seamless and transparent mapping
between the object-oriented programming language and the underlying database, offering benefits such as
improved performance, data integrity, and support for complex relationships.

Both object-relational and object-oriented databases aim to provide a more natural and flexible way of
representing and managing data, particularly for applications dealing with complex and evolving data
structures. Over time, these database models gained attention and saw some adoption, but they didn't
undermine the dominance of relational databases. Relational databases continue to be widely used due to their
maturity, standardization, and the extensive ecosystem built around them.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 39
EXAMPLE Some of the common object-oriented database types include Versant, Objectivity/DB, and
Oracle 12c.

 TERMS TO KNOW

Object-Relational Database (ORDBMS)


A type of database similar to a relational database but with an object-oriented database model; objects,
classes and inheritance are directly supported in database schemas and in the query language.

Object-Oriented Programming (OOP)


A way of organizing computer programs around objects, which are self-contained units that represent
real-world entities or concepts.

Object-Oriented Databases (OODBMS)


A database model that directly integrates object-oriented programming principles into database
management, storing data as objects.

5. XML Hybrids in the 1990s


In the mid-1990s, XML hybrid databases emerged as a way to integrate XML (Extensible Markup Language)
data into traditional database systems. XML is a widely used markup language for structuring and exchanging
data, similar to HTML in web design. However, it doesn't naturally fit into the relational data model. XML hybrid
databases combine relational databases' power with XML's flexibility. They store XML data in a structured
manner, allowing querying and indexing of XML elements and attributes. These databases typically provide a
mapping between XML and relational structures. This enables efficient storage and retrieval of XML data while
leveraging the benefits of relational database management systems.

XML hybrid databases employ technologies like XML Schema Definition (XSD) to define XML documents'
structure and validity constraints. They may support XML-specific query languages such as XQuery or XPath for
querying XML data within the database. Additionally, they often provide mechanisms to handle XML updates,
versioning, and integration with other data formats.

XML hybrid databases have found applications in various domains, such as content management systems, web
services, and data integration scenarios where XML plays a significant role. They bridge the relational and XML
worlds. This allows organizations to effectively manage and leverage XML data within their existing relational
database infrastructure, facilitating interoperability and data integration across different systems and
technologies.

 TERM TO KNOW

XML Hybrid Database


A database type that bridges the structured formatting of XML and the relational model of modern
databases.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 40
6. NoSQL in the 2000s and Beyond
The current generation of emerging data models from the early 2000s to the current time focuses on NoSQL.
“NoSQL” stands for “not only SQL” and refers to a class of databases that are non-relational, as you learned
earlier in this course. They enable the storage and retrieval of large volumes of unstructured and semi-
structured data in a flexible and scalable manner. In contrast to relational databases, which are always
structured as tables with relationships between them, NoSQL databases can employ a variety of data models,
such as key-value, document, columnar, and graph. With each model optimized for a specific use case,
organizations can select the most appropriate model based on their data needs.

Data models that are flexible, horizontally scalable, and fast performance make NoSQL databases ideal. Their
ability to handle big data, real-time data streams, and distributed systems makes them an excellent choice. In
order to support high scalability and data availability, NoSQL databases offer automated sharing, replication,
and fault tolerance features. Data can be distributed across multiple servers or clusters to provide horizontal
scaling. The system can handle rapidly growing workloads and efficiently process large datasets.

In spite of their flexibility and scalability, NoSQL databases may sacrifice some features of traditional relational
databases, such as strong consistency guarantees and complex querying capabilities. NoSQL databases,
however, have proved useful for many modern applications, including web applications, social networking
platforms, IoT (Internet of Things), and real-time analytics. They cater to the evolving needs of handling diverse
and massive data sources by offering a different paradigm for data management.

EXAMPLE There are many available options with this model, including SimpleDB from Amazon,
Bigtable from Google, Cassandra from Apache, and MongoDB.

 TERM TO KNOW

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 41
NoSQL
Not only SQL; a type of data system designed around unstructured data that needs to be processed
using a high availability and scalable process.

 SUMMARY

In this lesson, you learned that database technology has undergone significant advancements and
transformations from the 1960s to today. In the 1960s, databases were mainly hierarchical and
network-based data models, with data organized in tree-like or interconnected structures. A relational
model was introduced in the 1970s by Edgar F. Codd, and it revolutionized the field by introducing
tables with rows, columns, and relationships between them. As SQL became the standard data
manipulation and querying language, relational databases became the dominant model.

New database technologies emerged with the rise of the internet, big data, and distributed computing
in the late 20th and early 21st century. This included object-oriented in the 1980s and XML hybrids in
the 1990s. Data models such as key-value, document, columnar, and graph were developed for NoSQL
databases to handle unstructured and semi-structured data. These databases were scalable,
performant, and capable of handling a wide variety of data types. Cloud computing has transformed the
database landscape, enabling on-demand database services, flexible scaling, and greater accessibility.

The current generation of databases focuses on NoSQL, a class of non-relational databases that
enable storage and retrieval of large volumes of unstructured and semi-structured data in a flexible and
scalable manner. Despite their benefits, NoSQL databases may sacrifice some features of relational
databases, such as complex querying and strong consistency guarantees.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Common Business-Oriented Language (COBOL)


A popular programming language for data processing; systems were designed to handle batch
processing rather than real-time transactions.

Hierarchical Models
Databases that store data as records and organize them into a tree-like structure where one parent
node has many child nodes connected to it through links.

Network Model
The database in the network model represents objects and their relationships in a flexible manner.
The schema differs from other hierarchical or lattice models because it is viewed as a graph with
nodes and arcs representing objects and relationships.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 42
NoSQL
Not only SQL; a type of data system designed around unstructured data that needs to be
processed using a high availability and scalable process.

Object-Oriented Databases (OODBMS)


A database model that directly integrates object-oriented programming principles into database
management, storing data as objects.

Object-Oriented Programming (OOP)


A way of organizing computer programs around objects, which are self-contained units that
represent real-world entities or concepts.

Object-Relational Database (ORDBMS)


A type of database similar to a relational database but with an object-oriented database model;
objects, classes and inheritance are directly supported in database schemas and in the query
language.

XML Hybrid Database


A database type that bridges the structured formatting of XML and the relational model of modern
databases.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 43
Hierarchical and Network Data Models
by Sophia

 WHAT'S COVERED

In this lesson, you will distinguish between the hierarchical and network data models created in the
1970s. Specifically, this lesson will cover:
1. Hierarchical Model
2. Network Model

1. Hierarchical Model
The hierarchical model was one of the first widely used data models in the 1970s. It was created to manage
large amounts of data for complex manufacturing projects.

The logical structure of the hierarchical model looks like an upside-down tree, where the model contains levels
or segments. A segment in the hierarchical model is similar to a file system’s record type. In a hierarchy, the
higher layer is the “parent” of the “child,” or the segment below it. The hierarchical model is basically a set of
one-to-many relationships, where each parent segment can have many children, but each child segment can
only have one parent.

Advantages of the Hierarchical Model Disadvantages of the Hierarchical Model

It is easy to understand and visualize.


It is not as flexible as other data models.
It is efficient for storing and retrieving data that is
It can be difficult to update data in the model.
naturally hierarchical.
It is not as scalable as other data models.
It is relatively easy to implement.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 44
Here are some examples of hierarchical databases that you might still run into in the workplace.

IMS (Information Management System) was one of the first hierarchical databases developed. It was
developed by IBM in the 1960s and was widely used in the mainframe environment. This system helped us
get to the moon.
Hierarchic Data Store (HDS) was another early hierarchical database developed by IBM. It was designed for
use in the distributed environment.
Informix is a hierarchical database that is still in use today. It is designed for use in the enterprise
environment.

 TERMS TO KNOW

Segment
A structured set of data elements.

Hierarchical Model
A database model that consists of a series of one-to-many relationships, with each parent item having
one or more child items.

2. Network Model
The network model was created in the 1960s to help deal with some of the issues of complex data relationships
that could not be represented in the hierarchical model. It was designed to help improve performance in the
database and create a database standard.

The network model represents data as a network of interconnected records. It uses pointers to establish
relationships between data elements, allowing for more complex and flexible data structures compared to the
hierarchical model. Like the hierarchical model, the network model also used the one-to-many relationship, but
a record in the network model can have more than one parent. Even though network data models are no longer
used today, a lot of the key concepts that came from the network model are still used. These include the
schema, subschema, data manipulation language, and data definition language. Network models are useful for
non-hierarchical data, such as company organizational charts and family trees that may have multiple parents or
supervisors for a single person.

As information needs grew, the network model became a bit too cumbersome. Because there were no built-in
query capabilities, programmers had to create the code to run even the simplest reports. The commands
needed to query a network model database were a lot more complex than the SQL commands that you will be
using to query the data in our relational database. In addition, if there was any change in the database structure,
it would create problems across all of the applications that accessed the data, as they heavily depended on the
data model structure to match.

Advantages of the Network Model Disadvantages of the Network Model

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 45
It is more flexible than the hierarchical model. It is not as easy to understand and visualize as
It can represent complex relationships between the hierarchical model.
data. It can be difficult to update data in the model.
It is relatively easy to implement. It is not as scalable as the relational model.

Here are some examples of network databases you might still find in the workplace:

IDMS (Integrated Data Management System) was one of the first network databases developed. Cullinane
Corporation developed it in the 1970s, and it was widely used in the mainframe environment. There are still
companies that use this system.
ICL DMS was another early network database developed by ICL in the 1970s. It was designed for use in the
distributed environment.
Adabas is a network database that is still in use today. It is designed for use in the enterprise environment.

Some new concepts were introduced in the network model that were foundational to later models that came
along, including the first steps toward a programming language like SQL. Some of these new concepts were:
The schema, which is the conceptual organization of the database. This is the view seen by a database
administrator, who is able to see the complete database.
The subschema, which is the part of the database that the applications can see, rather than the entire
database.
Data manipulation language (DML) represents the commands used to work with the data. In modern
relational databases, this includes SQL statements such as SELECT, UPDATE, INSERT, and DELETE.
Data definition language (DDL) allows database administrators to create and remove the various schema
components in the database. In modern relational databases, this includes the SQL statements CREATE,
ALTER, and DROP.

 TERMS TO KNOW

Data Definition Language (DDL)


Commands that create and remove schema components in a database.

Data Manipulation Language (DML)


Commands that allow interaction with the data in a database.

Network Model
A database model that represents data as a network of interconnected records with pointers used to
establish relationships between data elements.

Schema
The conceptual organization of an entire database.

Subschema
The part of a database that applications interact with.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 46
 SUMMARY

In this lesson, you learned that the second generation of data models included the hierarchical and
network models. These models offered more sophistication than basic file systems but also added
cumbersome complexity to the task of tracking and finding files.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Data Definition Language (DDL)


Commands that create and remove schema components in a database.

Data Manipulation Language (DML)


Commands that allow interaction with the data in a database.

Hierarchical Model
A database model that consists of a series of one-to-many relationships, with each parent item
having one or more child items.

Network Model
A database model that represents data as a network of interconnected records with pointers used
to establish relationships between data elements.

Schema
The conceptual organization of an entire database.

Segment
A structured set of data elements.

Subschema
The part of a database that applications interact with.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 47
Relational Models
by Sophia

 WHAT'S COVERED

In this lesson, you will explore milestones in the development of the relational data models created in
the mid-1970s, in three parts. Specifically, this lesson will cover:
1. Historical Background
2. Model Features
3. Parts of a Relational Database

1. Historical Background
The relational database model was developed in the 1970s, with the goal of overcoming some of the limitations
of the hierarchical and network models. The simplicity of the relational model made it quite easy to use, which is
a big reason why it is still so commonly used today.

The relational model’s foundation was based on mathematical set theory and the idea of representing data as
independent relations. A relation, which you know as a table, is just a structure that has rows and columns.
Each row is called a tuple (a record), and each column represents an attribute (a field). This model was viewed
as a technical breakthrough for database users and designers.

IN CONTEXT

When the model was created in the 1970s, both the hierarchical and network models could not keep
up with the data needs of companies and governments. The older models were in need of an
overhaul. All database types started to grow because computing power grew exponentially once the
personal computer (PC) hit the market. With new operating systems and new hardware capabilities,
many different sizes and kinds of companies were able to start using relational databases. By the
1990s, databases started being the cornerstone of business, commerce, and computer systems.

Here are the core events that led us to today, and to the PostgreSQL that you will be using later in this course to
practice working with a relational database.

1970: Edgar F. Codd publishes “A Relational Model of Data for Large Shared Data Banks.”
1974: IBM releases System R, the first relational database management system (RDBMS).

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 48
1979: Oracle Corporation releases Oracle 1, the first commercially available RDBMS.
1982: Microsoft releases SQL Server, the first RDBMS for the Windows platform.
1983: Sybase releases Sybase SQL Server, a competitor to Microsoft SQL Server.
1986: Ingres releases Ingres, an open-source RDBMS.
1992: The SQL standard is released.
1998: MySQL is released, a popular open-source RDBMS.
2000: PostgreSQL is released, a popular open-source RDBMS.

 TERMS TO KNOW

Relational Database
Relational databases are databases that store data in tables. Each table is a collection of rows and
columns, and each row represents a single record.

Relation
Also known as a table, a row-and-column structure for holding data in a relational database.

Tuple
Also known as a record, one row in a relation (table) containing all the data for one item.

Attribute
A single piece of information within a record, also known as a field.

2. Model Features
Relational data models were implemented through a complex relational database management system
(RDBMS). The RDBMS helps hide the complexities of the relational model from the user, so that the user sees
only the relational database as a collection of tables where the data is stored. All of the underlying physical
features of the model are completely hidden.

The relationships between tables in a relational model can be set up as one-to-one or one-to-many. For
example, if one table holds orders and another holds order items, there's a one-to-many relationship because
one order can have many ordered items, but an ordered item can belong to only one order. An example of a
one-to-one relationship might be between an employee and their Social Security number.

Many-to-many relationships are broken down into multiple one-to-many relationships, with a relational table
representing the transition between the many-to-many relationships. The relational table has complete data and
structural independence. This means that how the data is physically stored in the database does not matter to
the developer or end user.

The relational data model is where you start to see the use of Structured Query Language (SQL) with the data
manipulation language (DML) and data definition language (DDL) statements that originated from the earlier
network model. Recall that the network model was too complex, requiring programmers to create the code to

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 49
run even the simplest reports. SQL, in contrast, is intuitive and easy to use because it is based on simple
English-language words.

Advantages of the Relational Database Model Disadvantages of the Relational Database Model

It is simple and easy to understand.


It can be difficult to implement complex data
It is flexible and can represent a wide variety of
relationships.
data relationships.
It can be difficult to optimize queries for
It is efficient for storing and retrieving data.
performance.
It is scalable and can be used to manage large
It can be difficult to manage data security.
amounts of data.

 THINK ABOUT IT

When you execute a SELECT statement in SQL, you do not have to know how it is gathering the data and
processing it behind the scenes. As an end user, you need only specify what must be done, without having
to define how it is done. The use of SQL with the relational model made it a lot easier to retrieve data than
in any other database that preceded it.

 TERMS TO KNOW

RDBMS
Relational database management system.

Structured Query Language


A programming language that allows data to be retrieved and manipulated in a relational database.

3. Parts of a Relational Database


From a user perspective, a SQL-based relational database consists of three parts.

The end-user interface is the part where you can interact with the data through the SQL code. For the SQL
you will practice in this course, this is through the web interface. For others, this is through a program or a
command line. There are many different types of user interfaces that can be used to connect to the same
database.
The second part is the collection of tables that the end user interacts with. Each of those tables is
independent of one another, and the rows in the tables are related to each other through some common
values in common attributes. This is where all the data is stored in a database.
The last part is the SQL engine. This is the part that is mostly hidden away from the end user. The SQL
engine is part of the database management software that runs all of the SQL queries and data requests.

You should also know the core parts of the collection of tables and how they are used:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 50
Component Description

Tables (also called relations) are the basic unit of data storage in a relational database. They are
organized into rows and columns, each representing a single record. The columns in a table
Tables
represent the attributes (also called fields) of the record, and the rows represent the values of
the attributes.

Columns are the individual attributes of a table. They are usually of the same data type and can
Columns
contain various data, such as numbers, text, and dates.

Rows are the individual records in a table. They contain the values of all the columns in the
Rows
table.

A primary key is a column or a combination of columns uniquely identifying each row in a table.
Primary Key The primary key cannot contain duplicate values. In a one-to-many relationship, the primary key
is on the “one” side.

A foreign key is a column or a combination of columns that refers to the primary key of another
Foreign Key table. The foreign key must contain the same values as the primary key of the other table. In a
one-to-many relationship, the foreign key is on the “many” side.

An index is a data structure that speeds up the retrieval of data from a database, much like the
Indexes index at the back of a book makes it easier to find specific topics in the book. Indexes are
created on columns that are frequently used in queries.

A constraint is a rule that restricts the values that can be stored in a column or a table.
Constraints
Constraints are used to ensure the integrity of the data in a database.

 TERMS TO KNOW

Columns
The individual attributes of a table, also called fields. They are usually of the same data type and can
contain various data, such as numbers, text, and dates.

Constraints
A rule that restricts the values that can be stored in a column or a table. Constraints are used to ensure
the integrity of the data in a database.

End-User Interface
Where ordinary database users interact with the data.

Foreign Key
A column or a combination of columns that refers to the primary key of another table.

Indexes
A data structure that speeds up the retrieval of data from a table. Indexes are created on columns that
are frequently used in queries.

Primary Key

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 51
A primary key is a column or a combination of columns uniquely identifying each row in a table. The
primary key cannot contain duplicate values.

Rows
The individual records in a table. They contain the values of all the columns in the table.

SQL Engine
The software that runs the database and is normally not accessible by end users.

Tables
The basic unit of data storage in a relational database. They are organized into rows and columns, each
representing a single record. The columns in a table represent the attributes of the record, and the
rows represent the values of the attributes.

 SUMMARY

Because the historical background of the hierarchical and network data models raised issues of
complexity and difficulty in searching out answers in the data, the relational data model was developed
in the 1970s. It had model features that made it easier to trace relationships between data, provided
consistent data structure, and allowed for easier queries into the data. These improvements are
possible because of the main parts of the relational database: an end-user interface, the collection of
data tables, and the SQL engine for interpreting queries.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Attribute
A single piece of information within a record, also known as a field.

Columns
The individual attributes of a table, also called fields. They are usually of the same data type and
can contain various data, such as numbers, text, and dates.

Constraints
A rule that restricts the values that can be stored in a column or a table. Constraints are used to
ensure the integrity of the data in a database.

End-User Interface
Where ordinary database users interact with the data.

Foreign Key

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 52
A column or a combination of columns that refers to the primary key of another table.

Indexes
An index is a data structure that speeds up the retrieval of data from a table. Indexes are created
on columns that are frequently used in queries.

Primary Key
A primary key is a column or a combination of columns uniquely identifying each row in a table. The
primary key cannot contain duplicate values.

RDBMS
Relational Database Management System.

Relation
Also known as a table, a row-and-column structure for holding data in a relational database.

Relational Database
Relational databases are databases that store data in tables. Each table is a collection of rows and
columns, and each row represents a single record.

Rows
The individual records in a table. They contain the values of all the columns in the table.

SQL Engine
The software that runs the database and is normally not accessible by end users.

Structured Query Language


A programming language that allows data to be retrieved and manipulated in a relational database.

Tables
The basic unit of data storage in a relational database. They are organized into rows and columns,
each representing a single record. The columns in a table represent the attributes of the record,
and the rows represent the values of the attributes.

Tuple
Also known as a record, one row in a relation (table) containing all the data for one item.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 53
Object and Relational Models
by Sophia

 WHAT'S COVERED

This lesson explores the use of object-oriented data models created in the mid-1980s through modern
approaches to big data, in three parts. Specifically, this lesson will cover:
1. Object-Oriented Models
2. Extended Relational Models
3. Modern Big Data Challenges

1. Object-Oriented Models
As data became more complex, more frequent, and larger, the network and hierarchical database models
simply could not grow to support business and data needs. Starting in the mid-1980s, the object-oriented and
object-objected relational data models came into use. Unlike the entity-relationship model of network and
hierarchical models, the object-oriented data model (OODM) stores both the data and the relationships in a
single structure within an object. The OODM is the basis of the object-oriented database management system
(OODBMS).

An OODM database supports the principles of object-oriented programming, such as inheritance,


encapsulation, and polymorphism. OODM uses inheritance to organize and structure data models. For
example, objects inherit attributes and methods from classes. The concept of encapsulation is used in object-
oriented database modeling (OODM) to group data and operations that operate on it into a single entity. The
encapsulation process hides an object's inner workings and details, exposing only the interfaces and methods
necessary to access and manipulate its data. Data integrity depends on this concept. It prevents direct access
and potential inconsistency by restricting how an object's internal state can be modified. Polymorphism
enhances OODM's code reusability, modularity, and flexibility. Objects of different types can be accessed
through a unified interface, simplifying the implementation of complex systems. "Programming to an interface" is
enabled by polymorphism, where objects can be treated generically based on shared behavior rather than
specific type. Maintainability and scalability are enhanced by loose coupling between objects.

Objects can be directly mapped to actual entities in the real world, allowing them to represent complex data
structures more naturally. OO databases offer features such as object identity, complex data types, and
navigation access to relationships. The object-oriented design of these applications makes them particularly
appropriate for applications with complex data models and interconnected data networks. Object-oriented

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 54
databases share some features with document-based NoSQL ones. For example, both define relationships
within an object's or document's metadata.

Objects contain facts, but they also contain metadata about themselves, which can include the relationships
between the facts. This gives the facts within the objects greater meaning.

The OODM closely models how we process data in the real world. In many ways, the object in an OODM is
equivalent to a fact, the data around that fact, and how that fact interacts with other facts. The attributes
describe the properties of the object that contains them (similar to the columns within a table). Objects that
have similar characteristics are grouped into classes, where they have a shared structure of attributes as well as
shared behaviors or methods. These methods define an available real-world action related to the object. In
many ways, they are similar to procedures or functions in other programming languages.

IN CONTEXT

An e-commerce system that stores information about products, customers, and orders is an example
of an object-oriented database. An object containing attributes such as name, price, description, and
quantity could be used to represent each product in such a system. A customer's data could be stored
as objects with attributes such as name, address, email, and phone number. A purchase order could
be represented as an object containing information about the purchased products, the customer
placing the order, the date on which the order was placed, and the amount paid. An object-oriented
database makes it easy to model and navigate relationships between objects, such as the association
between customer orders and billing information. Data structures that are interconnected and
complex can be represented in a natural and intuitive way using an object-oriented database.

 THINK ABOUT IT

Why might the hierarchical data model have become useful again for this type of data? How might older
approaches to organizing data become relevant again even in future models?

 TERMS TO KNOW

Encapsulation
Groups data and operations that operate on it into a single entity.

Inheritance
OODM uses inheritance to organize and structure data models. For example, objects inherit attributes
and methods from classes.

Metadata
Refers to descriptive and structural information providing context and data details. It describes various
data characteristics, such as its format, source, quality, content, location, and relationships with other
data.

Object-Oriented Data Model (OODM)

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 55
An object-oriented database model that organizes and represents data as objects, including both data
and behavior.

Polymorphism
Enhances OODM's code reusability, modularity, and flexibility.

Object
An organizing unit in object-oriented databases that includes a fact, information about the fact, and the
relationships between facts.

Class
A grouping of similar objects with shared attributes and behaviors.

Method
An action that can be performed on an object to which it is assigned.

2. Extended Relational Models


Starting in the mid-1990s, new models were developed to handle complex data representations. These new
models borrowed features from both object-oriented and relational models and are considered
object/relational (O/R) models. For example, the extended relational data model (ERDM) emerged as a way to
add the object-oriented model features in a more straightforward relational database structure.

Extended relational database models enhance traditional relational models by adding additional features to
extend their capabilities. Advanced ERDM features include array and multimedia support, encapsulation,
inheritance, and additional integrity constraints based on assertions and triggers. The model supports complex
scenarios and relationships in real-world scenarios, allowing for more flexible and expressive data modeling. An
ERDM bridges the gap between relational databases and object-oriented databases, combining their benefits.
The model provides support for sophisticated data manipulation and querying operations as well as supporting
more diverse and dynamic data structures. Extending traditional relational databases allows for a richer and
more comprehensive platform to manage and access complex data. Both O/R and NoSQL databases can use
Extensible Markup Language (XML) to store and exchange data. O/R databases being able to add support for
XML-based documents has made the use of these files more efficient.

 TERMS TO KNOW

Extended Relational Database Model (ERDM)


Extensions of traditional relational database models that add enhanced functionality. A relational model
based on ERDM introduces object-oriented concepts, such as inheritance, encapsulation, and methods.

Object/Relational (O/R) Database Model


A type of database model that bridges the gap between object-oriented and relational databases by
enabling objects to be directly stored and retrieved from the database.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 56
3. Modern Big Data Challenges
The current generation of emerging data models from the early 2000s to the early 2020s focuses on NoSQL
and big data. As you learned earlier in this unit, these include the key-value store, wide-column store,
document-oriented, and graph stores.

In today's world, there are mountains of stored data, and it’s not always possible to fit some of the unstructured
or social media data into the conventional structure of rows and columns. Many of these modern databases
address the needs and flow of the data and are specific to solving a single problem. In that way, they are
unique, and there isn't a single solution to meet all of the data storage needs of all organizations. In modern
NoSQL databases, JavaScript Object Notation (JSON) has emerged as a replacement for XML as the primary
means of storing structured and unstructured data.

In database design, JSON (JavaScript Object Notation) is widely used to represent structured data flexibly and
without the use of schema. Database designers often use JSON to store complex and dynamic data, allowing
nested and hierarchical structures that cannot be easily represented in traditional relational tables. JSON data
can easily be stored as text-based values within database fields, making it easy to store and retrieve various
data structures. JSON is a versatile and interoperable format for exchanging data between systems and
platforms, which allows developers to accommodate evolving data requirements and support rapid application
development. Despite JSON's flexibility, the data's quality, consistency, and indexing must be carefully managed
to ensure efficient querying and data integrity.

Many modern databases are cloud-hosted. This enables them to take advantage of cloud-based features such
as scaling automatically when the database is under load or when there is an increase in the speed of data
coming into the database.

Federal, state and industry regulations also have a big impact on how databases secure and protect data. The
entire organization must be involved in data governance to make sure that data is protected, stored, shared,
and captured in such a way as to not break any laws or regulations.

The future of databases is going to be one of specialization around the data, and how to holistically approach
design, implementation, management, oversight, and insights.

 TERM TO KNOW

JavaScript Object Notation (JSON)


A lightweight, popular data interchange format commonly used in databases. Using it, one can
represent data using key-value pairs or nested structures in a readable, easy-to-understand format.

 SUMMARY

In this lesson, you learned that the object-relational and object-oriented databases reflected the shift to
object-oriented computer programming languages in the 1980s. More recently, the need to manage,
organize, and share large amounts of unstructured data has led to the development of extended

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 57
relational models like object/relational (O/R) and NoSQL. XML was initially used as a data language, but
JSON is now the preferred format to address modern big data challenges.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Class
A grouping of similar objects with shared attributes and behaviors.

Encapsulation
Groups data and operations that operate on it into a single entity.

Extended Relational Database Model (ERDM)


Extensions of traditional relational database models that add enhanced functionality. A relational
model based on ERDM introduces object-oriented concepts, such as inheritance, encapsulation,
and methods.

Inheritance
OODM uses inheritance to organize and structure data models. For example, objects inherit
attributes and methods from classes.

JavaScript Object Notation (JSON)


A lightweight, popular data interchange format commonly used in databases. Using it, one can
represent data using key-value pairs or nested structures in a readable, easy-to-understand format.

Metadata
Refers to descriptive and structural information providing context and data details. It describes
various data characteristics, such as its format, source, quality, content, location, and relationships
with other data.

Method
An action that can be performed on an object to which it is assigned.

Object
An organizing unit in object-oriented databases that includes a fact, information about the fact, and
the relationships between facts.

Object-Oriented Database Model (OODM)


An object-oriented database model that organizes and represents data as objects, including both
data and behavior.

Object/Relational (O/R) Database Model

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 58
A type of database model that bridges the gap between object-oriented and relational databases
by enabling objects to be directly stored and retrieved from the database.

Polymorphism
Enhances OODM's code reusability, modularity, and flexibility.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 59
Conceptual Design
by Sophia

 WHAT'S COVERED

This lesson explores the steps in designing a new relational database. This lesson explains the first
step. Each of the other steps will be covered in upcoming lessons. Specifically, this lesson will cover:
1. Conceptual Model Design
1a. Gathering Requirements
2. E-Commerce Example

1. Conceptual Model Design


Conceptual data models are high-level representations of a system's data requirements and structure,
independent of its specific database management system or technical implementation. This document is
typically created during the early stages of database design to capture the essential entities, attributes, and
relationships that define the application's information needs.

During the conceptual design process, the primary focus is on understanding a company's or application's
requirements and how data entities are related to each other. At this point in the development process, it is
unnecessary to consider how data will be accessed or stored. Rather, it provides stakeholders, analysts, and
designers with a clear and abstract view of the data, enabling them to communicate and agree on the
fundamental data organization more effectively.

Conceptual data models are often represented using entity-relationship (ER) diagrams or other visual modeling
techniques, enabling stakeholders to visualize and validate data designs before moving on to logical and
physical designs. Conceptual design models guide the development of logical and physical database schemas,
ensure alignment with business needs and objectives, and serve as a foundation for refining and developing
database designs.

There are some basic design steps performed during the conceptual design phase.

The basic steps in this process are:

Design Step Description

1. Gather Requirements Understand and document the business or application requirements the database
should fulfill. This involves consulting stakeholders, end users, and domain experts

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 60
to identify the necessary data entities, attributes, and relationships.

Identify the main data entities or objects relevant to the application or domain.
2. Identify Entities Entities represent real-world objects or concepts that must be captured and stored
in the database.

Determine the attributes (properties) of each entity, describing the specific


3. Define Attributes
characteristics and data elements that need to be stored for each entity.

4. Establish Analyze relationships between entities. Identify associations, dependencies, and


Relationships connections between entities that describe how they relate to each other.

5. Create an Entity- Represent entities, attributes, and relationships using an entity-relationship diagram
Relationship Diagram (ERD). This visual representation helps stakeholders and designers better
(ERD) understand the data structure and relationship between entities.

Apply normalization techniques to ensure data organization efficiently, minimizing


6. Normalize the Model redundancy and anomalies. Normalization helps achieve a more scalable and
maintainable database design.

Review the conceptual design model with stakeholders and subject matter experts
7. Review and Validate to ensure that it accurately reflects the requirements and business rules. Make
necessary adjustments and refinements based on feedback.

Document the conceptual design model, including descriptions of entities,


8. Document the Model attributes, relationships, and any critical design decisions. This documentation will
serve as a reference for future database development.

 TERMS TO KNOW

Conceptual Data Model


A high-level representation of a system's data requirements and structure that establishes the entities,
their attributes, and relationships between entities in a relational database.

Entity Relationship (ER) Model


A visual representation that illustrates the structure and relationships within a database or data model.

1a. Gathering Requirements


The first step in conceptual model design is to discover the necessary characteristics of the data elements. This
is the most consequential part of database design, as issues here will show up all the way through the design
process and can impact performance, data translation, and database usage.

We start this process by posing some questions to help frame the type of data we will be gathering:

 REFLECT

Where is the data going to come from?


Is the data structured or unstructured (SQL, NoSQL)?

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 61
How will the data be used? Will it be mostly read for reports, will it be updated frequently (for example,
using sensors to measure values every few minutes), or will it be a combination of the two?
Who will use the data, how often will they access the data, and how will they use the data?
What are the various end-user data views going to be?
Where is the information to be found, and how is the information meant to be extracted?

Using questions like this, we can then start framing what the database should look like, how it will be used, who
will use it, and where the data will come from. This will help us determine what data elements are needed to
produce the information, along with what the data attributes should be. We can also define the relationships
that exist in the data, how frequently the data is to be used, and what any data transformations may need to be
to generate some of that information.

 HINT

It can be useful to have the database designer and the end users create a description of what the end-user
data views and reports are going to be. Designers and users working together will help identify what the
main data elements are and how they will be used. It can also be useful to look at any existing manual
process or application, if there is one, and if the database is going to be used to automate those manual
processes. This is a big part of one of the latter steps in the process: “Review and Validate.”
If there’s already an automated system in place, we can also take a look at what the current reports look like, as
well as determine what the designed reports would consist of for the new system. If an application is being
developed in conjunction with the database design, it would also be useful to review the application designs for
insight into the data elements.

Business rules can be of benefit when working out what the new system should do and where the data is
coming from. Data from manufacturing is different from data coming from research, for example, or there may
be other reasons why the system is being built. That knowledge is important for determining the entities,
attributes, relationships, connectivity, and constraints in the database. Business rules need to be simple and
precise and follow the business process from start to finish.

Later in the course you'll work with a Postgres database that has a number of tables about artists, music, and
tracks. Here are some business rules that might be appropriate for that data:

An artist can record more than one album.


An album can list only one artist.
An album can store multiple tracks.
A track can belong to only a single genre.
A track can be sold under only a single media type.
An employee can be the main support representative of many customers.
A customer can only have one main support representative.

Business rules help define what the entities are and what the key attributes and relationships will be. They also,
in many cases, start to define the cardinality between the entities—in other words, the one-to-one, one-to-many,
or many-to-many relationships. It is important that business rules are accurate and vetted by the business end

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 62
users and stakeholders. Having incorrect business rules can create underlying problems in the database that
will affect the applications that rely on it.

 TERM TO KNOW

Cardinality
The number of instances of one entity (or table) that can be associated with a single instance of another
entity (or table) through a relationship.

2. E-Commerce Example
Let's take a look at what a conceptual model may look like for a simple e-commerce site. The business rules are
as follows:

A customer can place many orders.


A customer uses a credit card to pay for the orders.
An order can only belong to a single customer.
An order can consist of many products.
A product can be purchased by many customers.
A product can belong to a single category.
A category can consist of many products.

A sample entity-relationship diagram would look like the following:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 63
© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 64
Based on these business rules, we see the following relationships:

Between a customer and an order is a one-to-many relationship.


Between an order and a product is a many-to-many relationship.
A category to a product is a one-to-many relationship.

 SUMMARY

In this lesson, you learned that conceptual models are high-level representations of a system's data
gathering requirements and structure, independent of its specific database management system or
technical implementation. The conceptual design document is typically created during the early stages

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 65
of database design to capture the essential entities, attributes, and relationships that define the
application's information needs. Finally, you explored an e-commerce example model design that
considered how business rules established specific relationships.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Cardinality
The cardinality of a relationship depends on the number of one-to-one, one-to-many, or many-to-
many relationships.

Conceptual Data Model


A high-level representation of a system's data requirements and structure that establishes the
entities, their attributes, and relationships between entities in a relational database.

Entity Relationship (ER) Model


A visual representation that illustrates the structure and relationships within a database or data
model.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 66
Entity Relationship Model
by Sophia

 WHAT'S COVERED

This lesson explores the entity-relationship diagram with Chen, crow’s foot, and UML notation, in four
parts. Specifically, this lesson will cover:
1. Three Notation Styles
2. Chen Notation
2a. Cardinality
2b. Using Chen Notation to Show Cardinality
3. Crow’s Foot Notation
4. UML Notation

1. Three Notation Styles


The entity-relationship diagram (ERD) represents the relationships between entities (objects or concepts) in a
database or system. ERDs provide a clear and concise view of data structure and help organize and flow data.
We basically draw out the entire database—all the tables, their keys, and the relationships between the tables,
including how they work and what is in them. It is sometimes referred to as an entity-relationship model (ERM).

As a simple example, let's explore how an ERD might express the relationships between entities in a university
information system. Students, courses, and instructors are the main entities in the system. Students have
attributes such as their StudentID, StudentName, Address, and EmailAddress. The Course entity has attributes
such as CourseID, Title, and CreditHours, while the Instructor entity has attributes such as the InstructorID,
InstructorName, and Office.

Relationships between entities are illustrated through lines. Using the Enrollment relationship, the Student entity
is related to the Course entity, representing that a student can enroll in multiple courses. Instructor and Course
entities also have a “Teaching” relationship, which indicates that an instructor can teach multiple courses, while
several instructors can teach a course. Relationships like this are portrayed as many-to-many.

Many-to-many relationships are handled using associative entities—in other words, additional entities that exist
only in order to associate data in one entity with data in another. In addition to capturing enrollment date and
grade, the Enrollment entity represents the relationship between a Student and a Course. Teaching
Assignments are associative entities between Instructors and Courses, storing details such as teaching
assignment IDs and semesters.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 67
ERDs provide a clear visual representation of university information system entities, attributes, and
relationships. By providing stakeholders with a better understanding of the structure and relationships of the
data, an ERD assists in system design and database creation, and ensures that student information is managed
efficiently. As a result, instructors, students, and courses can be better managed.

There are three commonly used notation systems for representing relationships in an ERD:

Chen notation: Chen notation is used to help illustrate modeling concepts and is a great notation to use
when performing the initial design. However, it can get messy quickly, as more relationships, entities, and
attributes are added.
Crow’s foot: Crow’s foot notation is the most widely used standard for diagramming relationships.
UML class diagram notation: UML class diagram notation is used for object-oriented or object/relational
databases.

 TERMS TO KNOW

Associative Entity
An entity that exists only to associate data in one entity with data in another.

Entity-Relationship Diagram (ERD)


Used to visualize the relationships among entities (objects or concepts) in a system or database, as well
as their structure, attributes, and interactions.

2. Chen Notation
Chen notation is a graphical representation technique widely used in entity-relationship modeling that
emphasizes the clear and concise depiction of entities, relationships, and attributes. A rectangle represents an
entity, a relationship is a diamond, and a line connects entities and relationships. It emphasizes cardinality and
participation constraints as a means of conveying the nature and degree of relationships between entities. In
addition, Chen notation provides a comprehensive overview of the data structure by depicting entity attributes
within entity rectangles. Chen notation simplifies understanding complex data models through intuitive symbols
and emphasizes relationships.

 TERM TO KNOW

Chen Notation
A visual representation technique for entity-relationship modeling that uses rectangles to represent
entities, diamonds for relationships, and lines to represent cardinality and participation constraints.

2a. Cardinality
The concept of cardinality describes the numerical relationship between entities in a relationship. This term
refers to the relationship between occurrences or instances of one entity and occurrences or instances of
another entity.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 68
Types of Cardinality:

1. One-to-One (1:1) Cardinality: It indicates that one instance of an entity is associated with exactly one
instance of another entity. For example, in a database of employees and their unique employee IDs, each
employee would have a unique ID, and the relationship between the employee and the ID would be one-to-
one.
2. One-to-Many (1:N) Cardinality: It signifies that one instance of an entity is associated with multiple instances
of another entity. For instance, in a database of customers and their orders, one customer can have multiple
orders, while each order is associated with only one customer.
3. Many-to-Many (M:N) Cardinality: It represents a situation where multiple instances of one entity are related
to multiple instances of another entity. For example, in a database of students and courses, multiple
students can enroll in multiple courses, forming a many-to-many relationship. To represent this relationship
in a database, an intermediate table or associative entity is typically introduced.

 TERM TO KNOW

Cardinality
The numerical relationship between entities in a relationship. This term refers to the relationship
between occurrences or instances of one entity and occurrences or instances of another entity.

2b. Using Chen Notation to Show Cardinality


Chen notation for a one-to-one relationship between table1 and table2 would look like the following:

Chen notation for a one-to-many relationship between table1 and table2 would look like the following:

Chen notation for a many-to-many relationship between table1 and table2 would look like the following:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 69
3. Crow’s Foot Notation
Crow's foot notation is a commonly used graphical representation technique in entity-relationship modeling
where various symbols such as crow's feet, lines, and diamonds are used to represent cardinality, relationships,
and attributes between entities. A crow's foot symbol (three lines) indicates a “many” side of a relationship,
while a straight line indicates a “one” side. Diamonds represent relationships, and associations are represented
by lines connecting entities. The data model is also presented with attributes within entity rectangles. As a
method of conveying database structure and relationships, crow's foot notation is intuitive and widely adopted.

Crow’s foot notation (below) is slightly different from Chen notation in how relationships are identified. In Chen,
you have 1:1 or 1:N, whereas in crow's foot, you have lines like -||- for a one-to-one relationship. Notice that the
relationship name is placed above the relationship line instead of in a diamond in some cases, but you can use
diamonds or names in crow's foot notation. The rectangles contain the table name at the top, and the attributes
appear below the table name. The relationship line shows a short bisecting line segment if it is on the “one”
side and a crow’s foot (two short bisecting lines) if it is on the “many” side.

Crow’s foot notation for a one-to-one relationship between table1 and table2 would look like the following:

Crow’s foot notation for a one-to-many relationship between table1 and table2 would look like the following:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 70
Crow’s foot notation for a many-to-many relationship between table1 and table2 would look like the following:

 TERM TO KNOW

Crow's Foot Notation


A graphical representation technique that uses various symbols to represent cardinality, relationships,
attributes, and relationships between entities, such as crow's feet, lines, and diamonds.

4. UML Notation
Unified Modeling Language (UML) class diagram notation can also be used to visually represent the structure,
relationships, and constraints when designing a database system. A UML class diagram represents database
entities as classes, with attributes and methods representing the associated data elements and operations.
Lines and multiplicity notations indicate the cardinality of associations and relationships between entities. UML
notation can easily represent a database's tables, foreign keys, and other database-specific elements. The
database schema can be communicated and documented in a standard and intuitive manner, making it easy to
understand, develop, and maintain the database system.

A class diagram (below) uses UML notation, which is similar to Chen notation. Notice that the class/entity name
is still at the top of the rectangle, with attributes below it (and then methods/functionality below that). The

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 71
connectivity is defined along the relationship line with symbols on each side, like 1..1 for one and 1..* for many.

UML class diagram notation for a one-to-one relationship between table1 and table2 would look like the
following:

UML class diagram notation for a one-to-many relationship between table1 and table2 would look like the
following:

UML class diagram notation for a many-to-many relationship between table1 and table2 would look like the
following:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 72
 TERM TO KNOW

UML Class Diagram Notation


A notation system that illustrates the structure and relationships among classes in an object-oriented
system, showing the attributes, methods, and associations between them.

 SUMMARY

In this lesson, you learned that the three notation styles of Chen notation, crow's foot notation, and
UML notation can all be used to model and represent database structures in PostgreSQL databases.
These are the most common notation styles for diagramming relationships between tables, columns
and data.

Chen notation is a graphical representation technique in entity-relationship modeling. Entities are


represented by rectangles, relationships by diamonds, and participation constraints by lines. Using
Chen notation, we can examine the data model holistically and emphasize cardinality and relationships
between entities.

Another graphical representation technique commonly used in entity-relationship modeling is crow's


foot notation. The crow's foot (three lines) depicts cardinality and relationships between entities. The
crow's foot notation is intuitive and widely used, making it easy for users to understand and
communicate the database structure.

UML notation can also be used to model relationships in databases, particularly in the form of UML
class diagrams. Entities are represented as classes with attributes and methods in UML notation. An
association represents a relationship between entities, and a multiplicity of notations indicates
cardinality. Database models can be standardized and widely recognized using UML notation. This
facilitates clear communication and documentation of the database schema.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Associative Entity
An entity that exists only to associate data in one entity with data in another.

Cardinality
The numerical relationship between entities in a relationship. This term refers to the relationship
between occurrences or instances of one entity and occurrences or instances of another entity.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 73
Chen Notation
A visual representation technique for entity-relationship modeling that uses rectangles to represent
entities, diamonds for relationships, and lines to represent cardinality and participation constraints.

Crow's Foot Notation


A graphical representation technique uses various symbols to represent cardinality, relationships,
attributes, and relationships between entities, such as crow's feet, lines, and diamonds.

Entity-Relationship Diagram (ERD)


Used to visualize the relationships among entities (objects or concepts) in a system or database, as
well as their structure, attributes, and interactions.

UML Class Diagram Notation


A notation system that illustrates the structure and relationships among classes in an object-
oriented system, showing the attributes, methods, and associations between them.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 74
Logical Design
by Sophia

 WHAT'S COVERED

This lesson explores the next step in designing a new database. We will consider how to extend the
conceptual model into a logical model design, in two parts. Specifically, this lesson will cover:
1. Moving From Concept to a Logical Model
2. Steps to Create a Logical Model

1. Moving From Concept to a Logical Model


Once the conceptual model has been created, the logical model is used for database design. In a logical
model, the conceptual model is further refined by transforming it into a representation that is closer to the
actual implementation of the database management system (DBMS). For example, a logical model is normalized
and specifies primary and foreign keys for relationships.

 BIG IDEA

Logical models are blueprints for how database data will be arranged and related. Data modeling allows
designers to understand data requirements, entity relationships, and data manipulation operations needed
to support an application. This model specifies technical implementation details for a specific DBMS
between the conceptual and physical models.
An entity's logical model defines its structure, including its tables, attributes, and relationships with other
entities. This ensures that data is stored and retrieved efficiently by accurately representing the application's
requirements. Data integrity constraints are incorporated into the logical model to ensure data consistency and
accuracy. In order to maintain data quality and prevent data anomalies, logical models define relationships
between entities and ensure referential integrity.

Logical models facilitate query optimization by considering the types of queries that are likely to be run on the
database. The logical model can be optimized by understanding the patterns of data access and query
requirements in order to enhance query performance and speed up data retrieval. The designer can also
eliminate redundant data and improve data efficiency by using a logical model to facilitate normalization
techniques. Data normalization reduces duplication and ensures consistency by replacing tables that contain
redundant or unrelated information with smaller, focused tables with relationships between them.

Logical models are flexible and extensible since they don't depend on any specific database management
system (DBMS). This decoupling allows the database to be migrated with minimal changes to different DBMS

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 75
platforms. Logical models facilitate communication and collaboration between stakeholders, designers, and
developers regarding database requirements by visually representing the database design. The data model
acts as a common reference point for discussions and ensures everyone is on the same page.

As logical models define entities, attributes, and relationships, designers can identify potential security
vulnerabilities and make appropriate access control plans to secure sensitive information. Integrity and
consistency of data are maintained by the logical model, which defines the rules and constraints for doing so.
Database errors and inconsistencies can be prevented by ensuring data is entered and managed correctly.

 HINT

The logical model bridges the gap between the conceptual and physical implementations in the database
design process. In addition to improving data integrity and consistency, it provides a detailed and structured
representation of data requirements, relationships, and constraints.
By the end of the logical model design process, all of the entities, attributes, and relationships are defined. The
primary key for each entity and the foreign key that links the tables together are specified. Normalization—
which you will learn more about in a future lesson—is also performed at this level. Finally, the characteristics
such as data location, path, and format should all be included.

 TERMS TO KNOW

Logical Model
A high-level structure and technical map for a database that specifies primary and foreign keys and has
been normalized.

Normalization
The process of applying design rules to a database to ensure that its table structures minimize
duplication and ensure efficiency and integrity.

2. Steps to Create a Logical Model


The first step in this process is to map the conceptual model to the logical model. This involves mapping out the
strong entities, supertypes/subtypes, weak entities, binary relationships, and higher-degree relationships.

Logical Model Elements

An entity that is not dependent on any other entity in the schema. A strong entity will
Strong Entities
always have a primary key.

An entity that doesn’t have sufficient attributes to require its own primary key. It
depends on a strong entity. One way to normalize a many-to-many relationship
Weak Entities
between two strong entities is to create a weak entity that has a one-to-many
relationship between each of the strong tables involved.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 76
A supertype entity has smaller groups (subtypes) that connect to it but add
specialized attributes common to the specific subtype, while the supertype has
Supertypes/Subtypes attributes that are common to all of the subtypes. For example, a supertype might be
“People,” with a subtype for “Employees,” “Vendors,” and “Customers.” A subtype is
an entity that derives some of its attributes from a more general (supertype) entity.

The type of relationship between two separate entities. When a binary relationship
Binary Relationships
exists, it can be one-to-one, one-to-many, many-to-one, or many-to-many.

A higher-degree relationship is one that occurs between more than two entities. For
Higher-Degree
example, a ternary relationship can occur between three entities. Larger numbers of
Relationships
relationships are possible but have yet to be recommended.
The strong entities are ones that are on the “one” side of a binary relationship within the data model. Review
the conceptual model. Which of the following would you consider strong entities?

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 77
© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 78
In this example, the strong entities are the customer and category entities. These are indicated by the use of
the two perpendicular crossed lines near the entity, which indicate a “one,” as opposed to the three prongs that
indicate a “many” relationship. This example uses crow's foot notation.

Once you have the strong entities mapped out, the second step is to move on to the supertype/subtype
relationships and the weak entities. In our example, we don’t have any supertype/subtype relationships.

IN CONTEXT

Imagine you have a Pet supertype entity. A subtype could be a Dog or a Cat entity with specific
attributes associated with them.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 79
A weak entity is one whose existence depends on a strong entity. In the same example of a Dog or a
Cat entity, both of those would be weak entities, as they would inherit the primary key from the Pet
entity to use as a foreign key. They would also be dependent on the Pet entity to exist; that is, a record
in the Dog table could not exist without a related record in the Pet table.

Next, you would work through all of the binary relationships between two entities. For example, you
would map out the one-to-many relationships between the product and category tables, and the
customer and order tables. Once you have those relationships mapped, you would move on to define
the relationships between three or more entities until all of the relationships are defined. In our
example, we had a many-to-many relationship between the order and product tables that needed to
be resolved. We will get into that level of detail in a future lesson. For now, make note that any many-
to-many relationships in a conceptual model must be resolved by having a bridge in the form of an
intermediary table that creates two one-to-many relationships.

At the logical model level, we also define the primary keys and foreign keys as part of the relationships
between entities and validate the model through normalization. We typically also define the integrity constraints
of the data, if needed. For example, we may define that the quantity of a product in inventory must be an
integer and must be greater than zero.

Typically, we won’t define the data types and sizes in the logical model, as this database design stage is not
meant to be database system specific. However, it is acceptable to generalize the data types in the logical
model—for example, identifying which attributes are a “number” data type versus a “text” data type.

Here is an example of the logical data model for the e-commerce company that we have been using:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 80
 SUMMARY

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 81
In this lesson, you learned that the logical model expands on the conceptual model to include all
entities and relationships between entities. This model includes all attributes for each entity, defines the
primary key of each entity, and adds the foreign keys to identify the relationships between the entities.
You also learned about the steps to create a logical model. Normalization is also performed on the
data model during the logical design stage, and any many-to-many binary relationships are handled by
splitting them into new one-to-many binary relationships.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Logical Model
A high-level structure and technical map for a database that specifies primary and foreign keys and
has been normalized.

Normalization
The process of applying design rules to a database to ensure that its table structures minimize
duplication and ensure efficiency and integrity.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 82
Physical Design
by Sophia

 WHAT'S COVERED

This lesson explores the last step in designing a database. We will explore physical model design, in
three parts. Specifically, this lesson will cover:
1. Rationale for Physical Design
2. Details of the Database
3. E-Commerce Example

1. Rationale for Physical Design


Creating a physical model is the final step in the database design process. This data model describes how the
database will be implemented using a specific database management system. The database administrator and
the database developers create this model. Data requirements and relationships are represented in the
conceptual model; however, a database management system (DBMS) is used to implement the technical
implementation details of the database.

The physical model converts the conceptual model into a format optimized for the data management system.
This is the build stage of the design process that relies on the conceptual and logical stages to be complete.
Any issues not addressed early will appear at the final physical stage.

In the conceptual model, entities and attributes are represented by physical tables, columns, and data types.

An index optimizes data retrieval and query performance by determining which columns should be indexed.
Data integrity constraints, including primary keys, foreign keys, and check constraints, are implemented to
ensure data is accurate and consistent. Data is partitioned and distributed across multiple servers or partitions
in distributed databases to achieve scalability and load balancing. We implement access controls and security
measures to protect sensitive data and ensure data privacy.

Query optimization executes plans based on the physical model's indexing and data distribution strategies.
Optimizing database parameters and configuring settings assist in achieving optimal performance under given
hardware and software conditions. Some of this might be physical hardware, networking, RAM, and other
underlying components, perhaps even including the access time of the storage devices on which the data
resides. Optimization takes many forms, including caching frequently requested information and other ways of
speeding up data presentation and retrieval. You can do many things to speed up a database's performance.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 83
A well-designed physical model is essential in order to meet the application's performance and scalability
requirements. DBAs use this document to create and configure the actual database schema, tables, and
indexes within the selected database management system (DBMS). Databases can handle large amounts of
data by converting conceptual models into well-optimized, technically sound physical designs that support
multiple concurrent users and deliver fast and reliable query responses.

 BIG IDEA

It is important to note that the physical data model will be different depending on the relational database
management system (e.g., PostgreSQL) that has been selected. Each database will use its own specific data
types.

 TERM TO KNOW

Physical Model
A technical plan for implementing the details outlined in the conceptual and logical models using a
specific database management system (DBMS).

2. Details of the Database


The physical design focuses on data storage, security measures, and performance measures. Before defining
the data storage organization, we have to know the volume of the data and usage patterns. With the sheer
quantity of data that a company creates, it is very important to understand the volume, variety, and velocity of
data for your design to work well.

 KEY CONCEPT

The three V's of data are volume, velocity, and variety.


Large-scale data sets present unique challenges, and these characteristics describe what makes big data
different from traditional data management:

Volume refers to the enormous amount of data generated and collected. Data is being produced at an
unprecedented rate due to the proliferation of digital devices and systems. Big data storage and
processing solutions are required to handle massive volumes of data efficiently.
Velocity relates to the speed of data generation and the need to process and analyze data in real time.
Applications that use big data often deal with data streams that require continuous analysis so that timely
insights can be derived, and dynamic decisions can be made.
Variety signifies the diverse types and formats of data and big data. This includes structured data (e.g.,
traditional databases), semi-structured data (e.g., JSON, XML), and unstructured data (e.g., text, images,
videos). Various data sources and formats necessitate flexible data storage and processing approaches.

Managing, storing, processing, and analyzing high-volume data presents challenges. A combination of
advanced data management techniques, distributed computing, cloud technologies, and innovative data
analytics methods is required to address these aspects. As a result, big data's vast and diverse universe can

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 84
provide valuable insights and opportunities. This is a unique but growing subset of database systems. While we
allude to them throughout this course, understanding how the “3 V's” apply to most databases will help you
design a much more capable and functional database system.
The column data types and sizes are also defined here. We need the indexes for each table beyond the primary
keys, depending on the usage patterns or performance requirements. If there are any anticipated views or
reports (especially recurring reports or dashboards) to create for the database, they would be useful to define
now. If you are revising an existing system, using the older report, view, and dashboards, it is a good idea to get
started with optimizing the system.

With respect to security, we take the time to define the group roles and user roles that can access this database
and what level of privileges they should have for each object. This goes beyond the physical data model itself,
but it is a crucial step to take when planning for the physical database design. Some data is regulated under
government programs such as HIPAA, FERPA, or COPA. This is the optimal time to ensure that the data storage,
access, and processing meets all legal, regulatory, and industry-level requirements.

All of the constraints we have defined as part of the logical data model, such as requiring that certain attributes
contain unique or non-null values, would also be implemented as part of the physical data model. Once this
model has been created, it should be ready to be implemented in the database.

 TERMS TO KNOW

Variety
The diverse types and formats of data and big data. This includes structured data (e.g., traditional
databases), semi-structured data (e.g., JSON, XML), and unstructured data (e.g., text, images, videos).
The variety of data sources and formats necessitates flexible data storage and processing approaches.

Velocity
The speed of data generation and the need to process and analyze data in real time.

Volume
The enormous amount of data generated and collected. Big data storage and processing solutions are
required to handle massive volumes of data efficiently.

3. E-Commerce Example
Here is a complete physical data model for the e-commerce database. You may notice that many of the core
elements are identical to the logical data model, with added data types and sizes. In other databases, changes
with added constraints may depend on the business rules or other criteria being incorporated.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 85
 SUMMARY

In this lesson, you learned about the physical design of a database. You learned that in database
design, a physical model refers to the actual schema implementation within a database management
system (DBMS). Translating the logical model, which represents the data requirements and
relationships, into an optimized DBMS format is the focus of the process. You also learned about
specific details related to databases. Physical models define the data types, column lengths, and
constraints for expressing the entities and attributes defined in the logical model. To ensure accurate
and consistent data, they implement integrity constraints, such as primary keys and foreign keys, to

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 86
optimize data retrieval and query performance. The physical model also addresses data partitioning,
allocation, and distribution to achieve scalability and load balancing, particularly in distributed
databases. The physical model also requires performance tuning and optimization, such as parameter
configuration, query optimization, and access controls.

You learned in the e-commerce example that physical models provide the blueprint for creating real
databases within the selected database management system. To meet the application's performance,
scalability, and security requirements, the physical model guides database administrators and
developers through implementing the database schema, tables, indexes, and access controls. Its
successful implementation ensures that the database operates efficiently, delivering fast and reliable
query responses while handling large volumes of data.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Physical Model
A technical plan for implementing the details outlined in the conceptual and logical models using a
specific database management system (DBMS).

Variety
The diverse types and formats of data and big data. This includes structured data (e.g., traditional
databases), semi-structured data (e.g., JSON, XML), and unstructured data (e.g., text, images,
videos). The variety of data sources and formats necessitates flexible data storage and processing
approaches.

Velocity
The speed of data generation and the need to process and analyze data in real time.

Volume
The enormous amount of data generated and collected. Big data storage and processing solutions
are required to handle massive volumes of data efficiently.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 87
Commercial Databases
by Sophia

 WHAT'S COVERED

This lesson explores some of the most common databases, in six parts. Specifically, this lesson will
cover:
1. Introduction
2. SQLite
3. SQL Server
4. Oracle
5. MySQL and MariaDB
6. PostgreSQL

1. Introduction
A database management system (DBMS) is software that helps organize and manage data in a structured way. It
provides tools for storing, retrieving, and manipulating data, making it easier to store, access, and update
information in a database. The DBMSs described in this lesson include SQLite, Oracle, SQL Server, PostgreSQL,
MySQL, and MariaDB. They represent a mixture of commercial and open-source products; some of them are
open source but have commercial (paid) options available that include additional support and tools.

2. SQLite
SQLite is a self-contained database that is file-based and completely open source. It is known for its portability,
reliability, and performance, even in low-memory environments. All the transactions are ACID-compliant
(atomicity, consistency, integrity, durability), even in cases where there’s a system crash or power outage.

It is unique in that it is a serverless database. Most other databases require a server process that programs
connect to when performing database requests. SQLite, however, accesses the database and reads from and
writes to the database disk file directly. Consequently, there is no need to configure a server or to configure
programs to connect to the database. If programs can access the disk that SQLite is on, they can access it
directly.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 88
One of its limitations is its limited concurrency. Although multiple processes can read and query simultaneously,
only one process can make changes to the database at a time. There is also no way to specify permissions for
different users with SQLite. As long as the underlying operating system permits users to access the disk file, the
user can access the database.

 TERM TO KNOW

Concurrency
The ability for multiple users or processes to access and change a file simultaneously.

3. SQL Server
SQL Server is a robust Microsoft product with a variety of editions, from its smaller-scale Express to an
Enterprise edition that has high-end data center capabilities. SQL Server isn’t only a database engine; it has
many other business intelligence tools. SQL Server uses its version of SQL with Transact SQL. It is a tool that is
highly supported, but it also comes at a high price. For example, an Enterprise edition license can cost upwards
of $14,000 per core. Unlike Microsoft Access (Microsoft's consumer-level database application), it can run on
Linux as well as in the cloud environment.

4. Oracle
Oracle has established itself as one of the top database systems used in enterprise solutions. It is owned by
Oracle Corporation. Similar to SQL Server, many different variants of Oracle are available, so there is one to
meet any organization's needs.

Oracle has no open-source version of its DBMS product. However, the XE or Express edition, which is the basic
edition, is free if used for training purposes. Oracle charges for it only for enterprise use. Oracle is one of the
most expensive DBMSs, and determining the end cost can be quite complex because of all the many optional
tools and features available.

Oracle also offers PL/SQL, a procedural language that provides developers with powerful tools for grouping
procedures and functions together into packages. Even with Oracle's high cost, many organizations prefer it
because it is fast, highly reliable, and easy to support.

5. MySQL and MariaDB


MySQL (now owned by Oracle) and MariaDB are some of the most popular open-source databases. MySQL,
which is designed for speed and reliability, powers many of the largest databases, including Twitter, YouTube,
Airbnb, Facebook, and Netflix. Once MySQL was taken over by Oracle, the original MySQL developers split off
to create MariaDB as a fork of the MySQL code, as they worried that Oracle had planned to kill MySQL. MySQL
now exists both in an open-source version and as a commercial offering, known as MySQL Enterprise Edition.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 89
While MariaDB remains open source, MariaDB Corporation, the company behind MariaDB, offers MariaDB
Enterprise as a commercial service. Its subscribers are not paying for MariaDB itself, but for services, support,
and extra features.

Like most other DBMSs, MySQL must be installed on a server. Many third-party tools are available for use with
MySQL/MariaDB, including phpMyAdmin.

6. PostgreSQL
PostgreSQL is one of the most advanced open-source relational databases. It was created to be standard-
compliant and highly extensible. Standard compliance is a significant element of PostgreSQL that differentiates
it from other databases. It is an object/relational database, meaning that although it is primarily a relational
database, it offers functionality like table inheritance implemented in object databases. PostgreSQL has no
corporate owner; it is maintained by a global network of volunteers. However, some companies sell support
services and tools for PostgreSQL as a commercial product.

PostgreSQL isn’t as widely used as MySQL due to a lack of additional third-party tools. However, PostgreSQL is
compatible with many programming languages and platforms, making it much easier to migrate the database
from one operating system to another or integrate it within a given tool. You will work with PostgreSQL later in
this course.

 SUMMARY

In this lesson you learned in the introduction that popular database management systems available
today include SQLite, SQL Server, PostgreSQL, MySQL, MariaDB, and Oracle. Depending on the
requirements and characteristics of the application or project at hand, the most appropriate database
depends on features, performance, scalability, and cost. This section focused on the primary relational
databases you will find in companies or other organizations. There are many smaller databases that
can be found, but they are generally not used to run companies or use data from the internet.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Concurrency
The ability for multiple users or processes to access and change a file simultaneously.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 90
Using ANSI SQL
by Sophia

 WHAT'S COVERED

This lesson explores how popular relational database management systems implement and extend the
SQL standards set forth by the American National Standards Institute (ANSI). Specifically, this lesson will
cover:
1. SQL Standard
2. PostgreSQL and ANSI SQL Standards
3. Standards in the Real World

1. SQL Standard
The American National Standards Institute (ANSI) has created a standard that defines how SQL is used in
relational databases. In addition to defining a common syntax and semantics for querying, updating, and
managing relational databases, the ANSI SQL standard also defines a way to make SQL commands portable
and consistent across different database management systems (DBMS).

 KEY CONCEPT

The standard specifies a number of basic SQL commands, such as SELECT, INSERT, UPDATE, DELETE, and
others, along with the associated clauses and options. Data integrity, transaction management, data types,
and other database operations are also defined. The ANSI SQL standard reduces vendor lock-in and
promotes interoperability among different vendors by allowing developers and database administrators to
write SQL code that works across various database platforms.

 KEY CONCEPT

Vendor lock-in refers to a situation in which a customer or organization becomes heavily dependent on a
particular vendor's products, services, or technologies to the extent that it becomes difficult or costly to
switch to an alternative vendor. This dependency can result from various factors, including proprietary
formats, unique features, custom integrations, or the high cost of migrating to a different vendor's solution.
Although ANSI SQL defines a common subset of SQL, many database vendors also provide proprietary
extensions and features that go beyond the standard. For example, Microsoft SQL Server uses a TOP command
to limit the number of rows returned in a query, and Oracle uses a ROWNUM command to do that same thing.
Different database systems also have their own ways of auto-incrementing columns. MySQL uses

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 91
AUTO_INCREMENT, and SQL Server uses IDENTITY. ANSI SQL doesn't include either of those, but it does have
its own way of doing the same thing, with the GENERATED ALWAYS AS IDENTITY syntax. So, even though ANSI
SQL ensures a certain degree of portability in terms of the basic command set, working with specific DBMS
implementations will often require some customization and adjustment.

 TERM TO KNOW

Vendor Lock-In
A situation in which a customer or organization becomes heavily dependent on a particular vendor's
products, services, or technologies to the extent that it becomes difficult or costly to switch to an
alternative vendor.

2. PostgreSQL and ANSI SQL Standards


PostgreSQL, an open-source relational database management system, aims to comply with ANSI SQL standards
to provide SQL compatibility and interoperability. While not fully compliant with all ANSI SQL standards,
PostgreSQL adheres to a significant portion of the standard. This makes it one of the most standards-compliant
open-source databases available.

Some ways PostgreSQL complies with ANSI SQL standards include:

Feature Description

PostgreSQL follows ANSI SQL syntax for fundamental SQL commands like SELECT, INSERT,
UPDATE, DELETE, and others, ensuring portability and familiarity for developers.
Basic SQL
PostgreSQL supports a wide range of ANSI SQL data types, such as numeric, character,
Syntax
date/time, and boolean. It also supports more advanced data types like arrays, JSON, and
universally unique identifiers (UUIDs).

PostgreSQL implements transaction management as per ANSI SQL standards, supporting


Transaction ACID (atomicity, consistency, isolation, durability) properties to ensure data integrity.
Management PostgreSQL allows the definition of primary keys, foreign keys, unique constraints, and
check constraints, which are part of the ANSI SQL standard for data integrity.

PostgreSQL supports common table expressions (CTEs), a feature introduced in ANSI SQL
Common Table that allows temporary result sets for complex queries. PostgreSQL supports ANSI SQL
Expressions window functions, enabling advanced analytical queries and calculations over data
(CTEs) partitions. Along with joins and aggregates, PostgreSQL provides various join methods and
aggregate functions in accordance with ANSI SQL standards.
While PostgreSQL is known for its excellent ANSI SQL support, it's worth mentioning that it also offers many
additional features and functionalities that go beyond the standard. This makes it a robust and powerful
database system for a wide range of applications. Developers can leverage standard-compliant features for
portability across different database systems, while also leveraging PostgreSQL's unique capabilities for specific
use cases.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 92
 TERMS TO KNOW

Universally Unique ID (UUID)


A 128-bit identifier that is guaranteed to be unique across both space and time. It is often used as a
primary key or unique identifier for records in a database, particularly in distributed and decentralized
systems where ensuring uniqueness is crucial.

Common Table Expressions (CTEs)


A feature introduced in ANSI SQL that allows temporary result sets for complex queries.

3. Standards in the Real World


Databases often do not support pure ANSI SQL, for several reasons. They mostly have to do with the
development of the database itself, the specific niche in which the database has been used, and the results of
marketing or customer surveys on what they need out of a database.

IN CONTEXT

SQL databases have a long history, dating back to the 1970s. Different database vendors developed
their own SQL versions before the ANSI SQL standardization efforts began. As a result, many
databases already had established proprietary features and extensions that users widely adopted.
Database vendors use proprietary extensions and features to differentiate their products in the
market. These extensions may provide advanced functionality, better performance, or unique
capabilities that are not part of the ANSI SQL standard. Vendors can attract customers and maintain a
competitive edge by offering exclusive features.

Many organizations have large, complex legacy databases built on specific vendor platforms and rely
heavily on proprietary SQL features. Migrating these systems to comply with the pure ANSI SQL
standard could be time consuming, costly, and may require significant changes to existing
applications.

Some databases optimize query processing and indexing using nonstandard SQL constructs. These
optimizations may provide superior performance for specific workloads, making them more attractive for certain
use cases. As technology advances, upcoming features and innovations may still need to be part of the ANSI
SQL standard. Database vendors may implement these features to improve functionality, usability, and
efficiency.

Over time, users become accustomed to the specific syntax and features of the database they use. Abruptly
switching to a pure ANSI SQL standard could disrupt workflows and require retraining. Most features are
developed because enough customers need a particular function, process, or command support. Developers
lock themselves into specific technologies and go to great efforts to know all the nuances of those technologies
and products. Changing a database or even removing a feature requires the code to evolve beyond the normal

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 93
scope of change and might break a business workflow. Changing or altering the function of a database is a
huge deal when it comes to business and manufacturing workflows.

While adhering to the ANSI SQL standard is beneficial for portability and interoperability, database vendors aim
to strike a balance regarding compliance with the standard. They also aim to offer unique features that meet
their user base's specific needs and preferences. As a result, most databases provide a mix of ANSI SQL-
compliant features and vendor-specific extensions to cater to a broad range of use cases and user
requirements.

 SUMMARY

In this lesson, you learned that to ensure portability and consistency across different database
platforms, PostgreSQL strives to adhere to ANSI SQL standards. The PostgreSQL database conforms
to a significant part of the ANSI SQL standard, but it also includes additional capabilities that go
beyond it. You learned about techniques for changing databases. SELECT, INSERT, UPDATE, and
DELETE commands are supported, as are ANSI SQL data types, ACID properties for transaction
management, and data integrity constraints. As part of the ANSI SQL standard, PostgreSQL offers
various join methods, CTEs, window functions, and aggregate functions. You also learned about
standards in the real world. The extensibility and openness of PostgreSQL make it a powerful and
versatile database system that enables users to create custom functions, data types, and procedural
languages, while maintaining ANSI SQL compatibility.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Common Table Expressions (CTEs)


A feature introduced in ANSI SQL that allows temporary result sets for complex queries.

Universally Unique ID (UUID)


A 128-bit identifier that is guaranteed to be unique across both space and time. It is often used as a
primary key or unique identifier for records in a database, particularly in distributed and
decentralized systems, where ensuring uniqueness is crucial.

Vendor Lock-In
A situation in which a customer or organization becomes heavily dependent on a particular
vendor's products, services, or technologies to the extent that it becomes difficult or costly to
switch to an alternative vendor.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 94
Using SQLite
by Sophia

 WHAT'S COVERED

This lesson explores the unique features of SQLite, in four parts. Specifically, this lesson will cover:
1. Introduction
2. Data Types
3. Primary Keys
4. Table Management
5. Joins

1. Introduction
SQLite is a unique database with some approaches that distinguish it from other databases. With SQLite, you
can manage structured data simply and efficiently with a powerful, lightweight, and self-contained relational
database management system.

 HINT

There is no need to install a database server for SQLite to run, as it is a serverless database. It can be
directly integrated into applications without complex setup or administration. A SQLite database is written in
C, has a small footprint and minimal overhead, and is highly efficient. This makes it ideal for applications
that run on mobile devices, embedded systems, and resource-constrained desktops.
Although SQLite is small and straightforward, it has a rich feature set and supports a variety of SQL commands
and data types. The system is fully ACID-compliant, ensuring the integrity and reliability of data, and it supports
concurrent access with robust locking mechanisms. Among the many industries and applications that use
SQLite are mobile app development (Android and iOS), web browsers, Internet of Things devices, desktop
software, embedded systems, and embedded data storage. Developers seeking an easy-to-use, portable, and
versatile database solution that seamlessly integrates into their projects prefer its ease of use, portability, and
versatility.

SQLite has been ported to various platforms like Windows, macOS, Linux, iOS, Android, and many others. The
applications that use the SQLite database don’t have to be written in a specific language, as you would typically
see with other databases. It works correctly as long as there is some way to bind and work with the external
libraries. All of the source code for SQLite is public domain so that it can be reused in other programs with no
restrictions.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 95
2. Data Types
SQLite has only a few data types:

REAL: A decimal number with a relatively low level of precision.


BLOB: A binary file such as an image file. BLOB is an acronym for Binary Large Object.
NULL: The absence of a value or the unknown.
INTEGER: A whole number.
TEXT: Character strings.

 DID YOU KNOW

SQLite is quite forgiving of the kind of data that you enter. If a column has the data type of an integer, and
you try to insert a text string that contains only numeric digits into that column, SQLite will try to convert the
text string into an integer. For example, if the user enters a text string 9876 into an integer column, that
value is converted to an integer of 9876 and stored in the column. If you try inserting a non-numeric string
like “abcd” into an integer column, most other databases will throw an error. However, SQLite will just store
the string value in the column.
In the database you will work with later in the course, many columns have a VARCHAR(40) that allows up to 40
characters. (VARCHAR is a data type for variable length values; you will learn about it later.) If you were to try to
insert more than 40 characters into such a column, many databases would either throw an error or truncate the
string to 40 characters. SQLite instead stores the entire string without the loss of information. This is viewed as
a feature of SQLite rather than a bug. However, it makes it very difficult for applications built using SQLite to be
ported to other databases, as such situations can create problems.

Booleans are another instance where SQLite does not store a separate data type. Instead, true and false are
represented by integers of 0 and 1.

Dates are also different in SQLite, where they are stored in a TEXT string. If a date is stored as an integer,
SQLite stores the number of seconds since midnight on January 1, 1970. There are built-in date and time
features to change between those values.

Tables can be created without any data types at all, which can be confusing. For example, you could run a SQL
command like this:

CREATE TABLE myTable (a, b, c);


This would create a table named myTable with the column names a, b, and c that have no data types defined.
Anything of any size could be stored in those columns.

3. Primary Keys

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 96
Each record in a table is uniquely identified by its primary key. Primary keys in SQLite are defined as columns or
column combinations.

 KEY CONCEPT

Each row in the table must have at least one unique value. In SQLite, the uniqueness constraint is enforced
automatically, so duplicate values are not inserted into the primary key column(s).
The primary key column(s) of SQLite are automatically indexed. This significantly enhances query performance
when searching for records using primary key values. Column indexes can also be created to speed up data
retrieval, particularly for large tables.

The SQLite database also supports composite keys, which use multiple columns to create unique identifiers for
each row. A composite key is defined by listing the columns separated by commas within the PRIMARY KEY
constraint.

In SQLite, a primary key can be a null value. This was originally a bug in the program. However, when the bug
was identified, so many databases had already used that bug as a feature that the bug was kept. If the primary
key field is of an INTEGER type, however, null values are not allowed. Even though it is technically possible to
have a null value in a primary key field, it is not recommended because it defeats the purpose of the primary
key field containing a unique identifier for each row.

4. Table Management
Creating, modifying, and maintaining tables within a SQLite database is all part of table management. Using the
CREATE TABLE SQL statement, developers specify the table name, column names, data types, and constraints
for a new table.

As you learned earlier in this lesson, SQLite stores different data types, including INTEGER, TEXT, REAL, and
BLOB. To ensure data integrity, the table can also be configured with constraints like PRIMARY KEY, UNIQUE,
NOT NULL, and FOREIGN KEY. Using the INSERT INTO statement, you can insert records (rows) and retrieve
data with the SELECT statement. The ALTER TABLE statement enables developers to modify existing tables by
adding or dropping columns, altering column data types, or applying constraints. Database structures can be
designed and managed efficiently using SQLite's intuitive and flexible table management system. You will learn
to use all of these commands later in the course.

SQLite enables developers to efficiently create and modify tables and manage table data. In SELECT queries,
developers can filter and sort the data to retrieve specific data, update existing records using UPDATE
statements, and delete records using DELETE statements. With SQLite, multiple operations can be grouped as a
single unit, ensuring data consistency and reliability.

Developers can define triggers, which perform actions in response to specific database events, such as
insertion, deletion, or updating of data. As a whole, SQLite's table management capabilities offer a
comprehensive solution to managing table structures, manipulating data, and maintaining data integrity.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 97
5. Joins
Developers can use SQLite joins to combine data from multiple tables based on related columns. Among the
types of joins supported by SQLite are INNER JOIN, LEFT JOIN (or LEFT OUTER JOIN), RIGHT JOIN (or RIGHT
OUTER JOIN), and FULL JOIN (or FULL OUTER JOIN). You will learn how to use all these join commands later in
the course. Based on the condition specified, an INNER JOIN returns only matching rows from both tables. A
LEFT JOIN returns all rows from the left table along with matching rows from the right table, while a RIGHT JOIN
does the opposite, returning all rows from the right table along with matching rows from the left table. When a
FULL JOIN is performed, all rows from both tables are returned, including unmatched rows, with NULL values
for nonmatching rows. By joining data from multiple tables, developers can generate comprehensive results by
combining related information. The “ON” keyword defines column relationships between tables using join
conditions. The use of joins is beneficial when dealing with complex data models, normalized databases, or
situations involving multiple data tables.

A developer must carefully consider the performance implications when using joins in SQLite, as inefficient joins
can slow down the execution of queries. Join conditions can be improved significantly by indexing the columns
involved. Developers can use the keyword JOIN or shorthand notations like INNER JOIN, LEFT JOIN, and so on,
according to their preference. When SQLite joins are used effectively, developers can retrieve data from
multiple tables in a single query, which helps facilitate comprehensive data analysis. In developing sophisticated
database applications, combining and presenting data from different tables meaningfully requires a flexible and
efficient way of working with related data.

 SUMMARY

In this lesson, you learned in the introduction that an embedded SQLite database management system
is lightweight, self-contained, and easy to use. This is a serverless database, meaning it operates
directly within the application and does not require a separate server process. This compact and
efficient database program is written in C programming language, making it an ideal choice for
resource-constrained environments like mobile devices, embedded systems, and desktop applications.
You learned that each record is recorded in a table using primary keys, and that SQLite only has a few
data types, though it offers reliable and ACID-compliant transaction support despite its small footprint.
You also learned that SQLite can help with table management, and developers can use SQLite joins,
such as INNER JOIN, LEFT JOIN (or LEFT OUTER JOIN), RIGHT JOIN (or RIGHT OUTER JOIN), and FULL
JOIN (or FULL OUTER JOIN), to combine data from multiple tables based on related columns. Several
applications use SQLite, including mobile apps (on Android and iOS), desktop software, web browsers,
IoT devices, and embedded systems, where lightweight and self-contained database capabilities are
required.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 98
Using MySQL and MariaDB
by Sophia

 WHAT'S COVERED

This lesson explores the unique features of MySQL/MariaDB, in three parts. Specifically, this lesson will
cover:
1. Introduction
2. MySQL and MariaDB History
3. Comparison to ANSI SQL

1. Introduction
MySQL and MariaDB are two of the most widely used open-source databases. MySQL’s popularity has much to
do with WordPress, a content management system that runs more than a third of all websites around the world.
WordPress uses MySQL as a database.

2. MySQL and MariaDB History


MySQL is a relational database management system in use worldwide, with a rich development and evolution
history. It was originally developed by Michael Widenius and David Axmark for MySQL AB in 1994. The "My" in
MySQL is a reference to co-founder Michael Widenius's older daughter, My, and the "Maria" in MariaDB is his
younger daughter's name.

IN CONTEXT

MySQL was first released in 1995. Due to its ease of use, high performance, and low cost, it quickly
became popular. MySQL was originally released under a dual-license model, offering both open-
source and commercial licenses. In particular, LAMP (Linux, Apache, MySQL, PHP) stacks for running
websites became popular due to MySQL's rapid popularity.

Sun Microsystems' acquisition of MySQL AB in 2008 led to Oracle Corporation's acquisition of Sun
Microsystems in 2010, making MySQL part of Oracle's portfolio. The open-source project MySQL has
so far been actively developed and maintained under Oracle's ownership, despite some concerns

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 99
about its future. However, Oracle does offer a commercial version of MySQL called MySQL Enterprise
Edition that includes professional support services.

As a result of some concerns, MySQL was forked to MariaDB. It was feared that Oracle's ownership
would lead to changes to MySQL's open-source nature, development direction, and licensing terms. To
address these concerns, Michael “Monty” Widenius, one of the original developers of MySQL, created
MariaDB as a drop-in replacement. The MariaDB platform was designed to remain true to the open-
source philosophy, providing users with a more transparent, community-driven, and inclusive
development process. Forking MySQL, MariaDB aimed to conserve the principles of collaborative
development and community-driven innovation while continuing the legacy of an open, free, and
accessible database system.

MariaDB also sought to improve MySQL's capabilities. The new features, optimizations, and
performance improvements made it often superior to MySQL. MariaDB releases updates more
frequently and fixes bugs and security issues more promptly, addressing user needs. Using an agile
approach, users can benefit from the newest enhancements as soon as they are released. Today's
MariaDB is a community-centric, open-source database solution that offers powerful, reliable, and
viable alternatives to MySQL. Although MariaDB and MySQL split, both remain closely related,
providing compatibility and familiarity for users switching between the two. Whereas MySQL is owned
by Oracle, MariaDB has no official owner. The MariaDB Foundation develops and maintains the
MariaDB database management system but does not own it. The MariaDB Foundation is an
independent organization that was established to ensure the open-source nature of the MariaDB
project and to provide a governance structure for its development.

Millions of developers and organizations around the world today use MySQL and MariaDB. As a result of their
popularity, they have become an integral part of many web applications, content management systems, e-
commerce platforms, and many other applications. Due to their performance, scalability, ease of use, and active
community support, MySQL and MariaDB have remained relevant and widely adopted over the years.

3. Comparison to ANSI SQL


The ANSI SQL standard defines a common syntax and semantics for querying, modifying, and managing
relational databases, ensuring portability and consistency across different database management systems.
While MySQL implements some ANSI SQL standards, extensions, and proprietary features, it is a specific
relational database management system.

The ANSI SQL standard defines a set of rules for how SQL queries and operations should be structured. Even
though MySQL strives to be ANSI SQL-compliant, its feature set is not identical. It contains extra features not
present in the standard, and it offers slight variations of certain SQL commands.

A DBMS offering MySQL's features and extensions beyond ANSI SQL is standard practice. Every database has
its own set of features and extensions that conflict with other databases. The extensions are unique to MySQL

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 100
and may or may not be present in other database systems following ANSI SQL.

 KEY CONCEPT

There may be differences in how specific functions or operations are implemented in MySQL compared to
other ANSI SQL-compliant databases. Performance and query optimization can be affected by these
differences.

ANSI SQL defines standard data types, but individual database systems may support additional data types
or handle certain data types differently. Some data types are unique to MySQL, and some are not standard.
Different database systems, including MySQL, implement transactional behavior and concurrency control
mechanisms differently.
Despite these differences, developers can write SQL queries in MySQL that are ANSI SQL-compliant, which
makes the code more portable and compatible with other databases. However, if you use MySQL-specific
extensions or features, the code may not be easily transferable to other database platforms. The MySQL
standard differs from ANSI SQL, so developers need to know these differences. The goal is to write SQL code
that is efficient, compatible, and maintainable.

Some of the key differences between ANSI and MySQL/MariaDB include:

Data types: MySQL/MariaDB supports all the same data types as ANSI SQL, but there are a few minor
naming differences. For example, ANSI SQL uses INTEGER, and MySQL/MariaDB uses INT.
Storage engines: MySQL/MariaDB supports multiple storage engines, each with its own features. The ability
to choose a storage engine is a MySQL-specific feature.
MySQL Workbench: MySQL provides a graphical user interface called MySQL Workbench for database
design, development, and administration. It can also be used with MariaDB.
User-defined values: MySQL/MariaDB allows the use of user-defined variables, which are specific to
MySQL.
Full-text search: MySQL/MariaDB has robust full-text search capabilities that go beyond ANSI SQL.
LIMIT and OFFSET: MySQL/MariaDB allows the use of LIMIT and OFFSET clauses with the UPDATE and
DELETE statements; ANSI SQL does not.
SHOW and DESCRIBE: MySQL/MariaDB has specific extensions to the SHOW and DESCRIBE statements,
providing additional information about databases, tables, and indexes.
Non-standard date and time functions: MySQL/MariaDB includes some date and time functions that are not
part of the ANSI SQL standard.

IN CONTEXT

There are other small differences not covered in this lesson. For example, here's a seemingly minor
difference that can potentially create drastic issues. If you access a column from a table to be
updated, ANSI SQL uses the original value, whereas in MySQL/MariaDB, the update uses the current
value in the column in the order that they are set in. Consider the following statement:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 101
UPDATE employee
SET pay = pay + 1, new_pay = pay;

Assume the original value for pay was 0. In ANSI SQL, the pay value will be set to 1, as it was set as
pay = pay + 1. Since the original value for pay was 0, the pay is now 0 + 1, which is equal to 1. However,
the value of new_pay would be set to 0, as that was the original value of pay.

In MySQL and MariaDB, though, the value would be different. With the same statement, pay would
again be set to 1, as it was set as pay = pay + 1. Since the original value for pay was 0, the pay is now 0
+ 1, which is equal to 1. However, when new_pay is being set to pay, MySQL/MariaDB uses the
updated value. The new_pay is set to 1 instead of 0, unlike with ANSI SQL.

 SUMMARY

In this lesson, in the introduction, you learned that both MySQL and MariaDB are open-source
relational database management systems (RDBMS). You explored the history of MySQL and MariaDB,
learning that the MySQL database was originally developed in 1994 by Michael Widenius and David
Axmark at MySQL AB. Several qualities contributed to its popularity, including ease of use,
performance, and compatibility with different platforms. Eventually, Sun Microsystems and Oracle
Corporation acquired MySQL, raising concerns about the open-source project's future. In response to
this, Michael "Monty" Widenius, one of MySQL's original developers, created MariaDB, a fork of MySQL.
The goal of MariaDB was to serve as a drop-in replacement for MySQL while maintaining an open-
source philosophy and community development. A number of additional features, improved
performance, and open-source commitments are included in MariaDB. With its community-centric
focus, it quickly gained traction and became considered an attractive alternative.

You learned that the syntax of MySQL and MariaDB is similar, so applications designed for one
database will work seamlessly with the other. You also learned about the differences between
MySQL/MariaDB and ANSI SQL. Web applications, content management systems, and other projects
widely use both RDBMS. There are many advantages to using MySQL, including its robustness,
popularity, and Oracle Corporation's active support. MariaDB has seen rapid adoption, particularly
among developers who appreciate its community-driven approach, frequent updates, and
transparency. With competitive features, performance, and support for many use cases, MySQL and
MariaDB are evolving. There are advantages and disadvantages to both, depending on individual
preferences, needs, and whether one favors the backing of a large corporation (MySQL) or a vibrant
open-source community (MariaDB).

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 102
© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 103
Considering Business Rules
by Sophia

 WHAT'S COVERED

This lesson explores the importance of considering database business rules, in two parts. Specifically,
this lesson will cover:
1. Business Rules
2. Implementing Business Rules in a Database's Design

1. Business Rules
In a generic sense, any sort of policy or guideline that a business has might be considered a "business rule."
However, in the context of database creation and management, the term has a different, more specialized
meaning. In a database management system, business rules are specific constraints and requirements that
govern how data should be stored, processed, and managed in a database. Implementing business rules in a
database's design ensures that the data remains accurate, consistent, and compliant with the organization's
policies and processes.

IN CONTEXT

Here are some examples of how business rules can be implemented in a database's design: Data
Validation:
Ensure that email addresses are in a valid format.
Enforce age limits for certain services or products.
Validate that product prices are within an acceptable range.

Data Integrity:
Enforce referential integrity constraints to maintain relationships between tables.
Prevent duplicate records or entries for unique data, such as employee IDs or customer account
numbers.
Use check constraints to limit the values that can be entered in a column, for example, ensuring a
"status" column only contains predefined values (e.g., "active" or "inactive").

Security and Access Control:


Implement role-based access control to restrict data access based on user roles or permissions.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 104
Enforce password policies to ensure strong and secure user passwords.

Audit Trails:
Record changes made to critical data fields, such as who made the change and when.
Maintain logs of data access and modifications for compliance and security purposes.

Derived Data:
Calculate and store derived data, such as a total order amount or a customer's credit score, based
on specific rules and formulas.
Create views or computed columns to generate data summaries or transformations.

Data Transformation:
Convert data from one format to another to meet specific requirements (e.g., transforming dates
or currencies for international operations).
Apply data cleansing rules to remove or standardize inconsistent data entries.

Temporal Data:
Implement rules for managing effective dates (validity periods) for data records, ensuring historical
and future data is correctly handled.
Enforce retention policies to automatically archive or delete data after a specific timeframe.

Workflow Automation:
Use triggers or stored procedures to automate specific actions when certain conditions are met,
such as sending notifications, updating statuses, or generating reports.
Implement approval workflows to enforce business rules for authorizations and document
approvals.

Data Constraints:
Enforce constraints related to physical limits, such as maximum storage capacity, or constraints
related to time, like data archiving and retention policies.
Implement rules for data purging or data aging, ensuring data is removed according to legal and
business requirements.

Legal and Compliance Rules:


Store legal agreements and contracts and enforce rules for expiration and renewal.
Implement compliance rules related to data privacy, financial regulations, or industry-specific
requirements, like HIPAA for healthcare data.

These are just a few examples of how business rules can be integrated into a database's design.
Properly implementing business rules helps organizations maintain data quality, consistency, security,
and compliance with their specific requirements and objectives.

 TERM TO KNOW

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 105
Business Rule
Specific constraints and requirements that govern how data should be stored, processed, and managed
in a database.

2. Implementing Business Rules in a Database's


Design
Implementing the right business rules in a database is crucial for maintaining data integrity, security, and
compliance. Here are some tips for successfully incorporating business rules into a database and following up
to make sure they remain relevant:

Understand and Document Business Requirements: Make sure you understand the business processes,
policies, and requirements of the organization. Document the business rules you will be implementing clearly,
ensuring they are specific, unambiguous, and well-defined.

Collaborate With Stakeholders: Involve all relevant stakeholders, including business users, IT staff, and
compliance or legal experts, in the rule definition and implementation process. Ensure that the business rules
meet the needs of different departments and align with overall business objectives. The frontline employees
who interact with the database on a daily basis often provide the most helpful insights in business rule
development.

Use Data Modeling and Standardization: Employ robust data modeling techniques to represent business rules
within the database structure. You can use the modeling method you learned in previous lessons in this course.
Standardize the representation and enforcement of rules across the database to maintain consistency.

Implement Data Validation and Constraints: Apply data validation rules at the database level to prevent
incorrect or inconsistent data entry. Enforce constraints such as uniqueness and referential integrity and check
constraints to ensure data accuracy and adherence to business rules.

Utilize Triggers and Stored Procedures: Implement triggers to automate actions or checks when specific events
occur, ensuring that rules are enforced in real time. Use stored procedures to encapsulate business logic,
allowing complex business rules to be implemented consistently.

Regularly Review and Update Rules: Conduct periodic reviews of business rules to ensure they remain relevant
and aligned with changing business needs, industry standards, and legal requirements. Update and adapt rules
as necessary, considering feedback from stakeholders and new business requirements.

Ensure Data Security and Access Control: Implement role-based access controls to restrict data access based
on user roles and responsibilities. Encrypt sensitive data and enforce security measures to safeguard against
unauthorized access or breaches.

Provide Training and Support: Offer training and support to database users to ensure they understand and
comply with the implemented business rules. Create documentation and guidelines that clearly explain the

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 106
rules and their implications.

Test and Validate Business Rules: Test the implemented business rules thoroughly before deployment to ensure
they function as intended. Validate rules in various scenarios to check for potential exceptions and ensure the
database behaves as expected.

Implementing and maintaining business rules in a database is an ongoing process that demands collaboration,
periodic evaluation, and a commitment to adapting to changing business needs. By integrating business rules
effectively, databases can function as reliable systems that align with company policies and meet stakeholders'
requirements.

 SUMMARY

In this lesson, you learned that business rules are specific constraints and requirements that govern
how data should be stored, processed, and managed in a database. Implementing business rules in a
database's design ensures that the data remains accurate, consistent, and compliant with the
organization's policies and processes.

First, you looked at some ways that business rules can be implemented in a database design to
achieve data validation, data integrity, security and access control, data transformation, workflow
automation, and regulatory compliance.

Then you learned some best practices for implementing business rules in a database, including the
importance of understanding and documenting rules, collaborating with shareholders, implementing
data validation and constraints, maintaining audit trails that document any changes made, and
periodically testing and validating rules to ensure they still reflect company policies.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Business Rule
Specific constraints and requirements that govern how data should be stored, processed, and
managed in a database.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 107
Translating Business Rules
by Sophia

 WHAT'S COVERED

This lesson explores the steps to translate business rules into databases, in two parts. Specifically, this
lesson will cover:
1. Translating Business Rules Into Database Features
2. Identifying Constraints

1. Translating Business Rules Into Database


Features
Business rules can be translated into database relationships by defining the logic and constraints that govern
how data is related and how it should behave in various scenarios. For most companies with established
business rules and workflows, these are well-established database relationships. When starting a new project,
implementing a new system, or even a new workflow, understanding the entire business process from start to
stop, from supply chain to finished product in the store, is where the database designer and programmer will
have the most impact on the company. Here are a few examples:

Business Rule
Description
Concept

Business rules help identify entities (objects or concepts) in the business domain and their
relationships. For example, a business rule in a customer and order management system
Entity-
may state that each order is associated with a single customer. This translates into a one-to-
Relationship
many relationship between the "Customer" table and the "Order" table in the database. If it
Modeling
were a many-to-many relationship, an associative (bridge) table would be required for the
connection to achieve normalization.

Business rules often require referential integrity, which ensures data relationships remain
Referential valid and consistent. For instance, a business rule might state that an order must have a valid
Integrity customer associated with it. The database enforces this through foreign key constraints that
link the order to a valid customer record.

Cardinality and Business rules define the cardinality and multiplicity of relationships. For example, a business
Multiplicity rule might specify that a product can be associated with multiple categories. This translates

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 108
into a many-to-many relationship between the "Product" table and the "Category" table,
often implemented using an associative or junction table in the database.

Business rules determine data validity and constrain data entry. For instance, a business rule
may require that the unit price of a product cannot be negative. This constraint can be
Data Validation enforced in the database using a check constraint or a validation rule. Data can also be
validated by unique customer IDs, ZIP codes, or even GPS coordinates of delivery vehicles
at regular intervals.

Complex business rules that involve multiple tables or require specific actions can be
Triggers and implemented using database triggers and stored procedures. For example, a business rule
Stored might specify that a discount should be applied when a customer's order amount exceeds a
Procedures certain threshold. This logic can be encapsulated in a database stored procedure that
triggers when an order is inserted or updated.

Business rules often involve deriving original data from existing data. For instance, a
business rule might dictate that the total amount of an order should be calculated as the sum
Data
of the individual line items' prices. This calculation can be performed in the database using
Derivation
queries or views. Other kinds of calculations come in the form of decision support systems
(DSS) that help define entire product life cycles.
Organizations can ensure data integrity, enforce consistency, and implement complex business logic by
translating business rules into database relationships and constraints. This guides the application's behavior
and supports overall business objectives effectively. This kind of data and workflow is a company's competitive
advantage in many ways. To lose this or have it misconfigured could cost the company.

 TERMS TO KNOW

Referential Integrity
Rules ensuring that data relationships remain valid and consistent.

Decision Support System (DSS)


A computer-based tool that helps individuals and organizations make informed decisions by providing
access to data, analysis, and interactive tools to support the decision-making process.

2. Identifying Constraints
Constraints are rules that limit what data can be stored in a record. For example, each invoice might have a
constraint that counts the number of line items and ensures that the count is between 1 and 100.

Or perhaps there is an order system that tracks product shipping. There might be a constraint requiring that the
shipping date (if entered) must be equal to or later than the order date because it wouldn't make sense to have
an order ship before it was ordered.

EXAMPLE Here's another example. When a customer creates a new account on an online retailer's
website, they enter their email address as their username on the system, so each account must have a

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 109
unique value in the Email column. However, the email address is not the primary key field in the Customer
table; the primary key field is an automatically generated, unique number in the CustomerID column. The
database designer could set the Email column up with NOT NULL and UNIQUE constraints to ensure that
each record has a unique email address.
Identifying and translating business rules into database structures and constraints is key to ensuring that the
database is designed to meet the needs of the underlying database system. Remember that not all business
rules directly apply to the database, as they may apply to the application system where data is input or
modified.

 SUMMARY

In this lesson, you learned about translating business rules into database features, specifically that
creating relationships and identifying constraints based on business rules can help define and
optimize a database's structure and performance. Some of the ways business rules can be
implemented include entity-relationship modeling, referential integrity, cardinality, data validation,
triggered and stored processes, and data derivation.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Decision Support System (DSS)


A computer-based tool that helps individuals and organizations make informed decisions by
providing access to data, analysis, and interactive tools to support the decision-making process.

Referential integrity
Rules ensuring that data relationships remain valid and consistent.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 110
Storage Limitations
by Sophia

 WHAT'S COVERED

This lesson explores the storage limitations of various databases, in four parts. Specifically, this lesson
will cover:
1. Introduction
2. Overall Size
3. Data Type Size Limits
4. Table Size Limits

1. Introduction
When creating a new database, choosing the right DBMS is important. This decision affects not only the
database's performance but also user experience, cost, reliability, and storage options. This lesson looks at the
question from the perspective of that last item: storage.

The choices you make will affect your database's three V's: velocity, volume, and variety. You must consider
storage type, capacity, speed, failure rate, cost, and data input and output.

 BIG IDEA

Your storage capacity needs will feature heavily in your DBMS choice, as not all database systems have the
same storage offerings. Storage limitations may originate within the DBMS software or can be vendor-
imposed.
Most commercial databases have different storage limitations (in other words, volume), and different versions of
specific databases may also have different storage limitations. In PostgreSQL, MariaDB, and MySQL, the
database itself can be unlimited in size, so your storage space is limited only by your hardware. Other
databases, like Oracle, Microsoft SQL Server, and SQLite, have a maximum cap in terms of database size.

These caps can affect the database's security, compliance, cost, access methods, and integration with
infrastructure. They can also affect your options in terms of redundancy, scaling, and availability. Data size and
data management are multifaceted, and there are many factors to consider.

Some of these factors include the following:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 111
Compliance and Security: Identify any specific compliance requirements (e.g., GDPR, HIPAA) that impact
data storage, encryption, and access controls. Implement security measures to protect sensitive data.
Cost: Evaluate the cost of various storage options (e.g., local storage, cloud storage) based on the database
size, performance requirements, and budget constraints.
Data Access Patterns: Understand data access and query usage patterns. Frequent or complex queries
may necessitate additional indexing or caching strategies to improve database performance.
Data Volume: Determine the expected amount of data the database will handle over time. You should
consider both the current volume of data and possible future growth to make sure the storage will be able
to handle the data without running out of space.
Integration With Infrastructure: Consider how the database storage integrates with the overall IT
infrastructure and the compatibility with the chosen database management system.
Redundancy and High Availability: Consider redundancy and high-availability solutions to prevent data loss
in case of hardware failures or disasters. Replication, clustering, or data mirroring can be used to enhance
availability.
Scalability: Plan for future growth and ensure the database storage can scale easily to accommodate
increasing data volumes and performance demands.

2. Overall Size
Two of the biggest database vendors include Microsoft SQL Server and Oracle.

SQL Server has an Express edition that limits the database size to 10 GB. The database itself is free and is
scalable and completely compatible with the paid editions, allowing seamless swapping. However, this
limitation does force organizations to make the switch to the paid version at the point where the database is
heavily used.

Oracle also has a free version, with a current limit of 12 GB for the database, which also is scalable and can be
swapped to a paid version. In comparison, the enterprise edition of SQL Server has a maximum size of 524
petabytes (PB), while Oracle has over 2047 PB. As a comparison, 1 PB is equivalent to 1,000,000 GB.

3. Data Type Size Limits


There may be different size limitations even for data types within a database. This can create some challenges
with data migration, especially if you have data that goes beyond a limit. Take a variable character (VARCHAR)
column as an example. Different databases have different max sizes: In ANSI SQL, there is no limit defined, but
in MySQL there is a 65,535-character limit, and in SQL Server, there is a 2 GB limit. In Oracle, there is a 32,767-
character limit, and in PostgreSQL, there is a 1 GB limit.

Another example is the character large object (CLOB) data type. This data type might hold XML data, file data,
or a temp table. In MySQL, there is a 216-byte limit; in SQL Server, there is a 231 – 1 byte limit; in PostgreSQL,

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 112
there is a 1 GB limit; and in Oracle, there is a 4 GB – 1 byte limit. As such, if we need to transfer data between
databases, we must recognize that this could be an issue and carefully plan how data migration would work. It
is usually best to store CLOBs separately from the database due to their unstructured nature and their size
compared to conventional database entries. A reference to the object's storage location is then created by a
link.

 KEY CONCEPT

The physical space available on a system can also be a concern. A number of issues can arise when the
database itself runs out of disk space.
Symptoms of being low on storage space include performance issues, a sharp increase in failed transactions,
application errors, and otherwise unpredictable behavior and data corruption. Ideally, there would be a monitor
on the system to alert when the data storage capacity is low.

4. Table Size Limits


Tables also have size limitations. Different databases have different criteria for these limits. Some of them are
based on the number of rows, while others are based on specific storage size. Some databases, like Teradata,
can be unlimited, while others, like MySQL, can have a max size of 256 TB. In SQL Server, a table can be as
large as the database itself, with 524,272 TB. Others, like SQLite, are based on the file size.

 SUMMARY

In this lesson, you learned in the introduction that in order for a database to store and manage data
effectively, it is critical to have sufficient overall size of the storage space. Having sufficient storage
space prevents the database from losing data and interrupting service due to growing data volume.
You learned about data type size limits and that the amount of space needed to handle transactional
data, log files, indexes, and other database components is necessary to prevent data corruption,
transaction failures, and slow performance. You also learned about table size limits. Planning and
monitoring storage space correctly helps ensure data integrity, efficient query processing, and overall
database reliability and availability.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 113
Query Processing Bottlenecks
by Sophia

 WHAT'S COVERED

This lesson explores some of the query processing bottlenecks, in two parts. Specifically, this lesson
will cover:
1. Introduction
2. Types of Bottlenecks
2a. Memory Bottlenecks
2b. Input/Output Bottlenecks
2c. Processing Bottlenecks
2d. Indexing Bottlenecks

1. Introduction
A bottleneck is a degradation in performance that occurs when demand for services or resources exceeds what
is available. There are a variety of reasons why bottlenecks occur. These include hardware limitations,
inefficient query processing, inadequate indexing, locking and concurrency issues, disk I/O constraints, and
network latency. A bottleneck can have a negative impact on the overall responsiveness, speed, and scalability
of a database, resulting in slow queries, increased response times, and poor system performance. The ability of
the database system to handle growing data volumes, user loads, and transactional demands without
compromising its efficiency requires identifying and resolving database bottlenecks.

Addressing database bottlenecks can involve performance tuning, query optimization, hardware upgrades, and
proper database resource configuration. Database administrators and developers must regularly monitor and
profile their systems to identify potential bottlenecks and implement suitable solutions. There are several ways
to alleviate bottlenecks and enhance database performance, including indexing, caching, partitioning, and load
balancing. Management and mitigation of database bottlenecks can help organizations maintain an efficient
database system that is reliable and responsive to the demands of applications and users.

2. Types of Bottlenecks
There are many types of bottlenecks that affect the performance of a database. These include memory,
input/output, processing, and indexing bottlenecks. We’ll first learn about memory bottlenecks.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 114
2a. Memory Bottlenecks
Memory bottlenecks occur when the database system cannot access or manage memory resources effectively.
It is common to encounter a memory bottleneck when insufficient memory is allocated to the database. This
results in frequent disk writes and reads, which causes query processing to be significantly slowed down. Data
paging in and out of disk storage is a means of simulating additional memory using hard disk space, at the
expense of performance, when the current working set (data actively used by queries and operations) exceeds
the available memory. The inefficient use of memory can also cause memory bottlenecks, causing certain
database components to consume excessive amounts of memory and starving other components. A database's
overall responsiveness may be negatively impacted by memory leaks, suboptimal caching strategies, or
excessive temporary table usage.

Management and allocation of memory resources are essential to solving database memory bottlenecks. A
server's memory settings need to be configured to optimize query performance, and the database must have
enough memory to accommodate the working set and anticipated growth. Caching and buffering techniques
can be implemented to minimize disk I/O and improve memory utilization by actively monitoring memory usage
and identifying memory-intensive queries or processes. Database systems must be tuned to efficiently utilize
memory and provide a high-performance and responsive environment for critical data operation by profiling
and tuning memory.

It is usually a good indication of an issue when we see longer query execution times, excessive input/output on
the system, or even out-of-memory messages in the logs. A query optimizer is like a smart assistant inside a
DBMS that determines the most efficient way to execute a query to retrieve or modify data, reducing the
amount of memory and processing required. Query optimizers are commonly used to improve query efficiency
and avoid memory bottlenecks. It is also possible to resolve a bottleneck by adding more physical memory. If
the system does not have enough memory (RAM), all processes running on it would compete for this resource.
This can result in performance degradation and could cause the system to become unstable. Adding more
physical memory can help to fix the bottleneck and improve system performance.

 TERMS TO KNOW

Memory Bottleneck
Occurs when the database system either does not have enough physical memory allocated for its use
or is unable to access or manage memory resources effectively.

Memory Leak
A situation where a program allocates memory but then fails to release it when the memory is no longer
needed.

Query Optimizer
A software that determines the most efficient way to execute a query.

2b. Input/Output Bottlenecks


Input/output (I/O) bottlenecks occur due to a bandwidth limitation in reading from or writing to the storage
device. I/O bottlenecks are generally associated with physical hardware limitations or latency issues. Latency is

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 115
a delay between when a request is issued and its fulfillment. For example, when an application requests a file
from a hard disk, the time it must wait for that file to be read and delivered is latency.

 KEY CONCEPT

I/O bottlenecks can be caused by slow disk drives or outdated storage technology. You can significantly
improve I/O performance by upgrading to faster and more modern storage solutions, such as solid-state
drives (SSDs).
When multiple database operations vie for the same disk resources, the throughput rate available may be
insufficient to handle all requests. One way to increase throughput is to implement a RAID. RAID stands for
Redundant Array of Independent Disks. It involves combining multiple physical disks into a single logical unit.
Some types of RAID are designed to increase I/O performance by spreading out the data across multiple disks.
For example, if a file is written across four different disks, with each disk storing just a portion of the file, the
amount of data written per disk is 1/4 the original amount and takes only 1/4 as long to write or read.

Fragmentation can slow down data retrieval on a traditional magnetic hard disk drive. Fragmentation refers to a
condition in which a file is not stored contiguously on the disk, so the read/write head has to move around,
picking up all the pieces of the file when it is requested. You can reduce fragmentation by running a disk
defragmenter or optimization utility. Such a utility rearranges the content on the disk so that all the parts of each
file are stored in adjacent sectors. This reduces the time it takes for the disk's read/write head to move to the
right spots to read all the parts of the file. Defragmentation is not necessary on a solid-state drive (SSD).

Query pagination can also help. It retrieves and displays a large set of data in smaller, manageable chunks or
pages instead of fetching and displaying it all at once. You can employ pagination by adding a page size
parameter to the query.

Multiple queries accessing the same data simultaneously can strain disk I/O with a high volume of concurrent
transactions. You can manage concurrency and I/O contention using transaction isolation levels and database
locking. Isolation levels define the degree to which one transaction must be isolated from the effects of other
concurrent transactions. Lower isolation levels offer better performance but may allow for anomalies between
concurrent transactions.

Database locking is a mechanism for controlling access to a database in a multiuser environment. When a
record is locked by one user, no other user may edit it.

Database disk caching is a way to improve performance by reducing the need to access physical storage for
frequently accessed data. It involves storing a portion of the database in the computer's memory to speed up
read operations. This works because accessing data from memory is significantly faster than reading it from the
disk. Using in-memory databases (that is, databases that rely primarily on memory for data storage and
retrieval) can increase performance, as can increasing the size of a disk's cache.

 HINT

Distributed databases are prone to I/O bottlenecks due to network latency between nodes. You can
mitigate this issue by reducing network latency or using distributed caching.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 116
Database administrators can monitor and profile the system, looking for ways to improve I/O speed. Some of
the methods they can use include upgrading storage hardware, optimizing queries, creating indexes, and
adjusting disk caching settings. Administrators should also regularly tune and optimize the database
configuration, run maintenance utilities such as a disk defragmenter (if appropriate for the storage type), and
monitor disk usage to ensure that I/O is not a bottleneck to optimal performance.

 TERMS TO KNOW

Input/Output (IO) Bottleneck


Occurs when there is a bandwidth limitation in reading from or writing to the storage device.

Latency
A delay between when a request is issued and its fulfillment.

Fragmentation
A condition in which a file is not stored contiguously on the disk, so the read/write head has to move
around, picking up the pieces of the file when it is requested.

Isolation Level
The degree to which one transaction must be isolated from the effects of other concurrent transactions.

Database Locking
A mechanism for controlling access to a database in a multiuser environment.

Disk Caching
A way to improve performance by reducing the need to access physical storage for frequently
accessed data.

In-Memory Databases
Databases that rely primarily on memory for data storage and retrieval.

2c. Processing Bottlenecks


Many factors can affect query execution and overall system performance, leading to processing bottlenecks—in
other words, bottlenecks due to the CPU not being able to keep up with the demands on it. It is possible that
some of these problems have more than one cause and solution.

 KEY CONCEPT

One way to resolve a processing bottleneck is to make queries more efficient so they do not require as
much processing time. For example, a query can be optimized and rewritten to use more efficient join
strategies.

Indexing problems can also lead to database issues. A lack of indexes for frequently queried columns can
lead to full table scans, which can cause processing bottlenecks. Creating indexes on key columns can
significantly speed up data retrieval.
Suboptimal database design can also cause processing bottlenecks. Evaluating database design and
considering denormalization, data partitioning, and proper table structures can improve data access and

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 117
processing.

A high level of concurrency can lead to high levels of contention and performance bottlenecks as a result of
locking. You can mitigate concurrency issues by implementing appropriate transaction isolation levels,
optimizing locking strategies, and employing techniques such as row-level locking.

On the hardware side of things, insufficient CPU, memory, and storage resources can limit a database's
processing capacity. You can improve processing capabilities by upgrading hardware components or sharding
databases to distribute workloads across multiple servers. Sharding is a database architecture strategy that
breaks up a large database into smaller, more manageable parts called shards. Each shard is an independent
database that stores a subset of the overall data.

The absence of query caching can result in significant processing overhead for frequently executed queries.
Implementing query caching mechanisms can improve response times for repetitive queries.

A database that supports multiple concurrent connections can experience a processing bottleneck when many
users access it at once. You can optimize resource utilization via connection pooling and connection
management techniques. Connection pooling is a technique for managing and reusing database connections.
Establishing and closing database connections can be resource intensive; connection pooling aims to improve
performance by reusing existing connections rather than creating new ones for each interaction with the
database.

 DID YOU KNOW

Process bottlenecks can also be caused by inadequate or irregular database maintenance. Database
performance can be enhanced by regularly rebuilding indexes, updating statistics, and cleaning up data.

 TERMS TO KNOW

Sharding
A database architecture strategy that breaks up a large database into smaller, more manageable parts
called shards.

Connection Pooling
A technique for managing and reusing database connections.

2d. Indexing Bottlenecks


As you learned earlier in the course, indexing involves making a list of all the values in a certain column to help
speed up searches. If there is poor or no indexing for a frequently accessed column, performance can suffer,
especially in a table with many records. Indexing columns that are frequently used to filter query records can
help.

An indexing bottleneck occurs when the process of maintaining and updating indexes becomes a performance
bottleneck. There are many reasons why indexing bottlenecks can occur, impacting the performance of
database queries. Examples include:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 118
Cause of
Indexing Description
Bottleneck

There are too This can cause data modification operations to run slowly. This is because the database
many indexes on maintains each index during INSERT, UPDATE, and DELETE operations. Identify any
a table. redundant or unused indexes and remove them to improve the speed of data modification.

This is caused by missing indexes on frequently queried columns since the database may
Query execution
need to perform full table scans. Create indexes on frequently used columns in WHERE,
may be slowed.
JOIN, and ORDER BY clauses to speed up data retrieval.

Query execution
performance As data changes over time, indexes can become fragmented, resulting in decreased
degrades over performance. Maintain index efficiency by regularly defragmenting or rebuilding them.
time.

This is based on query patterns and data characteristics (e.g., clustered, non-clustered,
A wrong index
covering). Assess how queries are accessed and how data is distributed to determine the
type is chosen.
right index type for each query.

Composite Composite indexes can be useful, but if they are not designed properly, they may cause
indexes have an bottlenecks. Make sure the most selective columns are at the beginning of composite
inefficient design. indexes to match query predicates.

The cardinality of an index is a measure of how many unique values there are in the
indexed columns compared to the total number of rows in the table. It is often expressed
There are issues as a ratio or percentage. When the cardinality of an index is high, the database may ignore
with index it, and when it is low, the search results may need to be narrowed down more effectively.
cardinality. Data distribution and query patterns should be aligned with index cardinality. Out-of-date
statistics can lead to suboptimal query execution plans caused by stale statistics. Update
statistics regularly so that the optimizer can choose the most efficient query plan.

There are issues Maintaining indexes can have a significant impact on database performance. They
with index consume storage space. It is important to balance query optimization with storage
overhead. overhead carefully.
Index maintenance, query performance monitoring, and query analysis are key to solving indexing bottlenecks.
You should periodically review index usage, identify redundant or missing indexes, and optimize the indexes
you want to keep. In order to ensure optimal query performance, you must strike a balance between the
number and types of indexes. A well-performing database system can be maintained by regularly evaluating
and tuning indexes.

 TERM TO KNOW

Indexing Bottleneck
Occurs when the process of maintaining and updating indexes becomes a performance bottleneck.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 119
 SUMMARY

In this lesson, you learned about different types of bottlenecks, which refers to a degradation in
performance that occurs when demand for services or resources exceeds what is available. You
learned that database bottlenecks affect databases' smooth and efficient operation. These include
memory bottlenecks, input/output bottlenecks, processing bottlenecks, and indexing bottlenecks.
Various factors may contribute to these bottlenecks, such as hardware limitations, inefficient query
processing, inadequate indexing, locking or concurrency issues, and network latency. Bottlenecks
negatively impact the database's responsiveness, speed, and scalability, resulting in slow query
execution, increased response times, and degraded system performance. A database system's ability
to handle growing data volumes, user loads, and transaction demands depends on identifying and
resolving database bottlenecks.

Tuning database performance, optimizing queries, upgrading hardware, and properly configuring
database resources are key to addressing database bottlenecks. To identify potential bottlenecks and
implement appropriate solutions, database administrators and developers must monitor and profile the
database system regularly. By employing indexing, caching, partitioning, and load balancing,
bottlenecks can be alleviated, and database performance can be improved. Proactively managing and
mitigating database bottlenecks can ensure that organizations maintain a responsive and reliable
database system that meets the demands of their applications and users.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Connection Pooling
A technique for managing and reusing database connections.

Database Locking
A mechanism for controlling access to a database in a multiuser environment.

Disk Caching
A way to improve performance by reducing the need to access physical storage for frequently
accessed data.

Fragmentation
A condition in which a file is not stored contiguously on the disk, so the read/write head has to
move around, picking up the pieces of the file when it is requested.

In-Memory Databases
Databases that rely primarily on memory for data storage and retrieval.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 120
Indexing Bottleneck
Occurs when the process of maintaining and updating indexes becomes a performance bottleneck.

Input/Output (I/O) Bottleneck


Occurs when there is a bandwidth limitation in reading from or writing to the storage device.

Isolation Level
The degree to which one transaction must be isolated from the effects of other concurrent
transactions.

Latency
A delay between when a request is issued and its fulfillment.

Memory Bottleneck
Occurs when the database system either does not have enough physical memory allocated for its
use or is unable to access or manage memory resources effectively.

Memory Leak
A situation where a program allocates memory but then fails to release it when the memory is no
longer needed.

Query Optimizer
A software that determines the most efficient way to execute a query.

Sharding
A database architecture strategy that breaks up a large database into smaller, more manageable
parts called shards.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 121
Database Migration
by Sophia

 WHAT'S COVERED

In this lesson, you will learn about data migration planning. You will learn the importance of drivers to
ensure that data is backed up to prevent loss and corruption during the migration processes.
Specifically, this lesson will cover:
1. Make a Data Migration Plan
2. Backup to Avoid Data Loss and Corruption
2a. Business Continuity
2b. Data Security

1. Make a Data Migration Plan


Data migration involves transferring data from one database to another while ensuring the data's integrity,
accuracy, and consistency. There are typically a number of steps involved in a migration, including:

1. Assessing the source and target databases.


2. Planning the migration strategy.
3. Profiling and cleansing data.
4. Mapping and transforming data between the two systems.
5. Backing up the source database.
6. Conducting test migrations.
7. Executing the final migration.
8. Validating the data.

A proper migration plan and documentation are essential for ensuring a smooth and successful transition
between two databases in the event of any unexpected issues.
Migrating data to a new system successfully requires a well-executed plan to ensure data integrity, minimize
disruption, and set the foundation for success. A data migration plan outlines the process of transferring data
between two systems, ensuring that the transition will be seamless and successful. Although the steps in a data
migration plan may vary according to the project's complexity and migration requirements, the following are
some common ones:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 122
 STEP BY STEP

1. Assess and plan the source and target systems, data structure, data quality, and compatibility.
2. Identify the migration objectives, scope, timeline, and resources required. Identify the challenges and
risks that may arise.
3. Profile the data to determine its structure, relationships, and quality. Then clean the data, if necessary,
to eliminate any inconsistencies or anomalies that might affect the migration. For example, you might
correct typos to avoid having multiple versions of what should be the same entry between records. By
identifying and resolving data inconsistencies, missing values, and other issues that may affect the
accuracy and reliability of the migrated data, you help ensure the data is accurate and reliable.
4. Develop a mapping document containing mapping rules that describe how data will be transformed
and mapped to the target system from the source. A mapping rule describes how the source's data will
be transformed and mapped to the target database's structure and format. Make sure that the two
systems are on the same page when it comes to data formats, schemas, and data types.
5. Make a full backup of the source data, and plan for contingency scenarios and rollback procedures in
case of unforeseen migration problems. You will learn how to do this in the next section of this lesson.
6. To validate mapping and transformation rules, perform a test migration with a subset of data. It is
important to verify that the target system functions properly, and that the data is accurately transferred.
When migrating data, validation rules are used to verify the integrity and quality of the data. By
identifying and resolving data inconsistencies, missing values, and other issues that may affect the
accuracy and reliability of the migrated data, you help ensure the data is accurate and reliable.
7. Perform the actual data migration once the testing has been successful. Maintaining data consistency
during migration may require downtime or data freezing on the source system.
8. In the target system, verify that the data is complete and accurate after migration. By validating the
data, you can ensure its accuracy and consistency. In order to improve the quality of your data, you can
perform additional data profiling and cleansing as well as use checksums or hash checks on your data
to detect any alterations. Backups and recovery procedures are essential to safeguarding data and
quickly resolving integrity and completeness issues.
9. Perform post-migration testing to ensure that the target system is functioning correctly and meeting
performance requirements. If any problems arise after migration, have a support team ready to assist.
10. The end users should be trained on the new system, and feedback should be gathered to address any
usability concerns they may have.
11. Once the migration is complete, archive or purge the old data from the source system, freeing up
space and reducing redundant data.
12. For future reference, it is important to document the entire migration process, including procedures,
test results, and any lessons learned.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 123
 TERMS TO KNOW

Data Migration
The process of transferring data from one database to another while ensuring the data's integrity,
accuracy, and consistency.

Validation Rule
A constraint that prevents invalid, inaccurate, or inconsistent data from being populated into database
entities.

2. Backup to Avoid Data Loss and Corruption


A database migration involves inherent risks and complexities. Data migration involves moving large volumes of
critical data between systems, which leaves room for data loss, corruption, and other problems. Backing up the
original database creates a reliable and complete copy before migration begins. Backups ensure administrators
can quickly restore the database to its original state in case of data loss or corruption during migration. This
prevents disruption to business operations and financial or reputational damage.

 HINT

Backups also offer a rollback capability, allowing organizations to revert to the pre-migration state if the
migration encounters major problems or fails to meet expectations. With this flexibility, database

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 124
administrators can proceed with their migration plans without worrying about losing valuable data or
compromising production environments' stability.
Furthermore, backups are essential to ensuring business continuity. During the migration process, there may be
temporary downtime or a transitional database state. Backups enable organizations to quickly recover and
restore services in case of unexpected delays or complications that prolong the migration timeline, ensuring
uninterrupted access to crucial data and smooth operations. Backups generally provide a vital safety measure
that mitigates risks and enhances confidence in the migration process, safeguarding valuable data and
ensuring business continuity.

2a. Business Continuity


Business continuity refers to the ability of a business to maintain its normal operations and productivity levels
regardless of what special situations are occurring, including not only planned events like data migration but
also unexpected natural and human-created disasters. The following practices can help ensure business
continuity throughout a database migration:

Conduct a comprehensive risk assessment to identify potential challenges and issues during the migration.
Engage key stakeholders, including business owners, IT teams, and end users, to understand their
requirements and concerns.
Plan out the scope, goals, timeline, and allocation of resources in a comprehensive migration plan.
Ensure your plan addresses unforeseen circumstances by including contingency measures.

Migrating smoothly depends on a well-considered plan that minimizes disruptions and protects all aspects of
the process.

 KEY CONCEPT

You should create and verify a complete backup of the existing database before beginning the migration.
The backup serves as a safety net in case of data loss or corruption during migration. Make sure backups
are validated regularly to ensure data integrity and rapid recovery. You should also develop a rollback
strategy for reverting to the original database state if the migration encounters major problems. In order to
maintain business continuity and data availability, it is imperative to have a robust backup and rollback
strategy.
Consider incremental or parallel migration approaches, where data is migrated in manageable batches. With
this approach, business operations are less affected by migration, as it allows for continuous monitoring and
validation. You might alternatively consider running the old and new database systems simultaneously during
the transition phase with data synchronized between them. During the migration, users have access to up-to-
date information because of the real-time synchronization of data between the two databases. Syncing and
verifying all data is the key to minimizing downtime and ensuring a seamless switch to the new database.

It is important for organizations to follow the steps outlined in this section to mitigate risks, maintain data
integrity, and ensure the continuity of their business operations during database migrations. Planned testing,
thorough backups, and robust rollback strategies facilitate a seamless transition to the new database
environment, allowing businesses to continue operating uninterrupted.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 125
2b. Data Security
Maintaining customer trust during a database migration is crucial for data security. This can be done by
encrypting data, using secure servers, and limiting access to sensitive data. Additionally, organizations should
keep customers informed of the progress of the migration and any steps they are taking to protect their data.

Protect data during migration by encrypting it to prevent unauthorized access or interception. Make sure the
communication between the source and target databases is secure by using in-transit encryption protocols,
such as TLS/SSL, that protect data while in transit so that it cannot be exposed or tampered with. You might also
encrypt all data at rest on storage devices using folder-based encryption like NTFS's Encrypting File System
(EFS) and/or whole disk encryption such as Windows' Bitlocker.

Ensure strict access control mechanisms are implemented to restrict the number of individuals with migration
privileges. Follow the principle of least privilege when assigning roles and permissions to users involved in
migration processes. To prevent unauthorized access to the database systems, use strong authentication
methods such as multifactor authentication (MFA). Detect and respond quickly to any suspicious behavior
detected during the migration by auditing and monitoring user activity.

Protect sensitive or personally identifiable information (PII) before migration by using data masking or other
anonymization techniques. With data masking, a realistic but fictional representation of the original data is used
to replace sensitive data during migration, ensuring that the original data is not exposed. Masking data can
prevent data breaches and comply with privacy regulations by enabling developers and administrators to work
with realistic data.

Identify and address any security weaknesses by testing and assessing both the source and target databases
before, during, and after the migration. Maintain a high level of data security throughout the migration process
by engaging with cybersecurity experts and following industry best practices. Organizations can mitigate data

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 126
security risks and ensure a secure and successful database migration by implementing robust security
measures and monitoring the migration closely.

 TERMS TO KNOW

Principle of Least Privilege


A security strategy that gives users only the minimum privileges they need to do their work.

Multifactor Authentication (MFA)


The practice of requiring two different forms of authentication for access to a system, such as a
username/password and a PIN or access code.

Data Masking
A method of anonymization that replaces sensitive data with consistent, nonsensitive values.

 SUMMARY

In this lesson, you learned that the process of migrating data between two databases involves a series
of steps in a data migration plan to ensure smooth and accurate data transfer. In most cases, the
process begins with assessing the source and target databases to determine their structures, data
types, and relationships. A plan is then developed, detailing the migration's scope, objectives, timeline,
and resource requirements. Following the data profiling and cleansing process, the data will be
transformed or corrected as necessary based on its quality and consistency. After that, the data
mapping and transformation rules are defined, which describe how data is mapped from the source
database to the target database. This step handles any differences between the two databases
regarding data formats or schemas.

You also learned that it is important to validate the migration process and ensure data accuracy; a
subset of data is migrated to ensure the mapping rules are in place. During this phase, any
discrepancies or issues are addressed. Upon successful testing, the actual migration is carried out. A
data validation and integrity check and a backup to avoid data loss and corruption is conducted before
and after migration to ensure that the source and target databases are identical and quality standards
are met. The new database is implemented after a successful migration, after user training has been
provided, and after the documentation has been updated. The performance of the database may also
need to be monitored and tuned periodically after the migration to ensure optimal performance. Data
migration between databases can be efficient and secure if organizations follow these steps and best
practices, minimizing disruptions and maintaining business continuity, data security, and data integrity
at every stage.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND Faithe Wempen (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 127
 TERMS TO KNOW

Data Masking
A method of anonymization that replaces sensitive data with consistent, nonsensitive values.

Data Migration
The process of transferring data from one database to another while ensuring the data's integrity,
accuracy, and consistency.

Multifactor Authentication (MFA)


The practice of requiring two different forms of authentication for access to a system, such as a
username/password and a PIN or access code.

Principle of Least Privilege


A security strategy that gives users only the minimum privileges they need to do their work.

Validation Rule
A constraint that prevents invalid, inaccurate, or inconsistent data from being populated into
database entities.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 128
Terms to Know
ACID
An acronym that stands for atomicity, consistency, isolation and durability of data.

Analytical Databases
A specialized type of database designed for storing, retrieving, and analyzing large volumes
of data.

Application Programming Interface (API)


A set of rules, protocols, and tools that allow different software applications to communicate
with each other.

Associative Entity
An entity that exists only to associate data in one entity with data in another.

Attribute
A single piece of information within a record, also known as a field.

Back End
The part of the database system responsible for managing and storing data, including the
DBMS and the physical storage devices. The back end is not directly accessible to end users.

Business Analytics
The practice of using data analysis and statistical techniques to drive informed business
decision making.

Business Rule
Specific constraints and requirements that govern how data should be stored, processed,
and managed in a database.

Cardinality
The cardinality of a relationship depends on the number of one-to-one, one-to-many, or
many-to-many relationships.

Chen Notation

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 129
A visual representation technique for entity-relationship modeling that uses rectangles to
represent entities, diamonds for relationships, and lines to represent cardinality and
participation constraints.

Class
A grouping of similar objects with shared attributes and behaviors.

Columnar Data Storage


Storing data in columns rather than in rows.

Columns
The individual attributes of a table, also called fields. They are usually of the same data type
and can contain various data, such as numbers, text, and dates.

Common Business-Oriented Language (COBOL)


A popular programming language for data processing; systems were designed to handle
batch processing rather than real-time transactions.

Common Table Expressions (CTEs)


A feature introduced in ANSI SQL that allows temporary result sets for complex queries.

Conceptual Data Model


A high-level representation of a system's data requirements and structure that establishes
the entities, their attributes, and relationships between entities in a relational database.

Concurrency
The ability for multiple users or processes to access and change a file simultaneously.

Connection Pooling
A technique for managing and reusing database connections.

Constraints
A rule that restricts the values that can be stored in a column or a table. Constraints are used
to ensure the integrity of the data in a database.

Crow's Foot Notation

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 130
A graphical representation technique uses various symbols to represent cardinality,
relationships, attributes, and relationships between entities, such as crow's feet, lines, and
diamonds.

Data
Known as “raw facts,” the data we collect, like phone numbers or addresses.

Data Definition Language (DDL)


Commands that create and remove schema components in a database.

Data Manipulation Language (DML)


Commands that allow interaction with the data in a database.

Data Masking
A method of anonymization that replaces sensitive data with consistent, nonsensitive values.

Data Migration
The process of transferring data from one database to another while ensuring the data's
integrity, accuracy, and consistency.

Data Warehouse
A backend enterprise-level system used for storing data that is used for analysis and
reporting.

Database
A structured and organized collection of data that is stored electronically.

Database Administrator (DBA)


A user of the database management system who ensures that the database is running
correctly.

Database Designer
A database architect of the database who ensures that it fits the business needs and
functions optimally.

Database Locking
A mechanism for controlling access to a database in a multiuser environment.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 131
Decision Support System (DSS)
A computer-based tool that helps individuals and organizations make informed decisions by
providing access to data, analysis, and interactive tools to support the decision-making
process.

Delimiter
A sequence of one or more characters for specifying the boundary between separate,
independent regions in plain text, mathematical expressions, or other data streams.

Disk Caching
A way to improve performance by reducing the need to access physical storage for
frequently accessed data.

Document-Oriented Database
A non-relational database that pairs a key with a complex data structure.

Encapsulation
Groups data and operations that operate on it into a single entity.

End User
The user of the applications to run the day-to-day operations of the organization.

End-User Interface
Where ordinary database users interact with the data.

Entity Relationship (ER) Model


A visual representation that illustrates the structure and relationships within a database or
data model.

Entity-Relationship Diagram (ERD)


Used to visualize the relationships among entities (objects or concepts) in a system or
database, as well as their structure, attributes, and interactions.

Extended Relational Database Model (ERDM)


Extensions of traditional relational database models that add enhanced functionality. A
relational model based on ERDM introduces object-oriented concepts, such as inheritance,
encapsulation, and methods.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 132
Flat File
A data file that is not related to or does not contain any linkages to another file.

Foreign Key
A column or a combination of columns that refers to the primary key of another table.

Fragmentation
A condition in which a file is not stored contiguously on the disk, so the read/write head has
to move around, picking up the pieces of the file when it is requested.

Front End
A user interface or application that enables users to interact with a database without directly
interacting with the underlying DBMS.

Graph Store
A non-relational database that links data through edges, nodes, and properties.

Graphical User Interface (GUI)


A type of user interface through which users interact with electronic devices via visual
indicator representations.

Hardware
CPU, RAM, and disk drive speed all have an influence on how well a database runs on a
computer system.

Hierarchical Model
A database model that consists of a series of one-to-many relationships, with each parent
item having one or more child items.

Hierarchical Models
Databases that store data as records and organize them into a tree-like structure where one
parent node has many child nodes connected to it through links.

In-Memory Databases
Databases that rely primarily on memory for data storage and retrieval.

Indexes

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 133
An index is a data structure that speeds up the retrieval of data from a table. Indexes are
created on columns that are frequently used in queries.

Indexing Bottleneck
Occurs when the process of maintaining and updating indexes becomes a performance
bottleneck.

Information
Data in a context that gives it meaning.

Inheritance
OODM uses inheritance to organize and structure data models. For example, objects inherit
attributes and methods from classes.

Input/Output (I/O) Bottleneck


Occurs when there is a bandwidth limitation in reading from or writing to the storage device.

Isolation Level
The degree to which one transaction must be isolated from the effects of other concurrent
transactions.

JavaScript Object Notation (JSON)


A lightweight, popular data interchange format commonly used in databases. Using it, one
can represent data using key-value pairs or nested structures in a readable, easy-to-
understand format.

Key-Value Store
A simple non-relational database that pairs each key with a single value.

Knowledge
Information or data that has been organized, stored, and made accessible in a structured
manner.

Latency
A delay between when a request is issued and its fulfillment.

Logical Model

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 134
A high-level structure and technical map for a database that specifies primary and foreign
keys and has been normalized.

Memory Bottleneck
Occurs when the database system either does not have enough physical memory allocated
for its use or is unable to access or manage memory resources effectively.

Memory Leak
A situation where a program allocates memory but then fails to release it when the memory is
no longer needed.

Metadata
Data about data, which describes characteristics or relationships of the data.

Method
An action that can be performed on an object to which it is assigned.

MongoDB
A popular example of a NoSQL flat file database that stores data in an object-oriented way.

Multifactor Authentication (MFA)


The practice of requiring two different forms of authentication for access to a system, such as
a username/password and a PIN or access code.

Network Model
The database in the network model represents objects and their relationships in a flexible
manner. The schema differs from other hierarchical or lattice models because it is viewed as
a graph with nodes and arcs representing objects and relationships.

NoSQL
Not only SQL; a type of data system designed around unstructured data that needs to be
processed using a high availability and scalable process.

NoSQL Database
A database that can work with semi-structured or unstructured data by enabling the storing
and querying of data outside the traditional structures found in relational databases.

Non-Relational Databases

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 135
Databases that do not use the traditional tabular relational model that is used in relational
database management systems.

Normalization
The process of applying design rules to a database to ensure that its table structures
minimize duplication and ensure efficiency and integrity.

Object
An organizing unit in object-oriented databases that includes a fact, information about the
fact, and the relationships between facts.

Object-Oriented Database Model (OODM)


An object-oriented database model that organizes and represents data as objects, including
both data and behavior.

Object-Oriented Databases (OODBMS)


A database model that directly integrates object-oriented programming principles into
database management, storing data as objects.

Object-Oriented Programming (OOP)


A way of organizing computer programs around objects, which are self-contained units that
represent real-world entities or concepts.

Object-Relational Database (ORDBMS)


A type of database similar to a relational database but with an object-oriented database
model; objects, classes and inheritance are directly supported in database schemas and in
the query language.

Object/Relational (O/R) Database Model


A type of database model that bridges the gap between object-oriented and relational
databases by enabling objects to be directly stored and retrieved from the database.

Online Analytical Processing (OLAP)


Software technology that is used to analyze business data from different points of view.

Online Transaction Processing (OLTP) Databases


Operational databases, which are used for handling real-time transactions daily.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 136
Parallel Processing
The simultaneous execution of multiple operations across multiple CPU cores in a distributed
computing environment to improve performance.

People
Database administrators, database designers and users are the people who manage, design,
and use databases. Each type of person has specific needs and responsibilities when
working with a database.

Physical Model
A technical plan for implementing the details outlined in the conceptual and logical models
using a specific database management system (DBMS).

Polymorphism
Enhances OODM's code reusability, modularity, and flexibility.

Primary Key
A primary key is a column or a combination of columns uniquely identifying each row in a
table. The primary key cannot contain duplicate values.

Principle of Least Privilege


A security strategy that gives users only the minimum privileges they need to do their work.

Procedures
Rules on how a database is used or designed. Think of this as similar to how you would
design a business process, but here, how and what business data is stored, how it is used,
who accesses it, and how it can be accessed.

Query Optimizer
A software that determines the most efficient way to execute a query.

RDBMS
Relational Database Management System.

Referential integrity
Rules ensuring that data relationships remain valid and consistent.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 137
Relation
Also known as a table, a row-and-column structure for holding data in a relational database.

Relational Database
A database that has many tables, with shared common columns between tables.

Rows
The individual records in a table. They contain the values of all the columns in the table.

SQL Engine
The software that runs the database and is normally not accessible by end users.

Schema
The conceptual organization of an entire database.

Segment
A structured set of data elements.

Sharding
A database architecture strategy that breaks up a large database into smaller, more
manageable parts called shards.

Software
The operating system and RDBMS (Relational Database Management System) that influence
performance and management/administration of a database.

Spreadsheet
A row-and-column grid in which structured data tables can be created.

Structured Query Language


A programming language that allows data to be retrieved and manipulated in a relational
database.

Structured Query Language (SQL)


A specialized programming language used for managing and manipulating relational
databases.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 138
Subschema
The part of a database that applications interact with.

System Administrator
A person who oversees the entire database/information system to ensure that everything is
operating optimally.

Tables
The basic unit of data storage in a relational database. They are organized into rows and
columns, each representing a single record. The columns in a table represent the attributes
of the record, and the rows represent the values of the attributes.

Transactional Database
A database system designed for managing the day-to-day operations of an organization.

Tuple
Also known as a record, one row in a relation (table) containing all the data for one item.

UML Class Diagram Notation


A notation system that illustrates the structure and relationships among classes in an object-
oriented system, showing the attributes, methods, and associations between them.

Universally Unique ID (UUID)


A 128-bit identifier that is guaranteed to be unique across both space and time. It is often
used as a primary key or unique identifier for records in a database, particularly in distributed
and decentralized systems, where ensuring uniqueness is crucial.

Validation Rule
A constraint that prevents invalid, inaccurate, or inconsistent data from being populated into
database entities.

Variety
The diverse types and formats of data and big data. This includes structured data (e.g.,
traditional databases), semi-structured data (e.g., JSON, XML), and unstructured data (e.g.,
text, images, videos). The variety of data sources and formats necessitates flexible data
storage and processing approaches.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 139
Velocity
The speed of data generation and the need to process and analyze data in real time.

Vendor Lock-In
A situation in which a customer or organization becomes heavily dependent on a particular
vendor's products, services, or technologies to the extent that it becomes difficult or costly to
switch to an alternative vendor.

Volume
The enormous amount of data generated and collected. Big data storage and processing
solutions are required to handle massive volumes of data efficiently.

Wide-Column Store
A non-relational database that allows the names and format of columns of its tables to vary
from row to row.

XML Hybrid Database


A database type that bridges the structured formatting of XML and the relational model of
modern databases.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 140

You might also like