Introduction to Database Management
Data Models
• Databases must represent:
• the data itself (typically structured in some way)
• associations between different data values
• optionally, constraints on data values
• What kind of data can be modeled?
• What kinds of associations can be represented?
• The data model specifies:
• what data can be stored (and sometimes how it is stored)
• associations between different data values
• what constraints can be enforced
• how to access and manipulate the data
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Relations
• Relations are basically tables of data acct_id branch_name balance
• Each row represents a record in the relation A-301 New York 350
A-307 Seattle 275
• A relational database is a set of relations A-318 Los Angeles 550
• Each relation has a unique name in the database ... ... ...
• Each row in the table specifies a relationship The account relation
between the values in that row
• The account ID “A-307”, branch name “Seattle”, and
balance “275” are all related to each other
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Relations and Attributes
• Each relation has some number of attributes acct_id branch_name balance
• Sometimes called “columns” A-301 New York 350
A-307 Seattle 275
• Each attribute has a domain A-318 Los Angeles 550
• Specifies the set of valid values for the attribute ... ... ...
• The account relation: account
• 3 attributes
• Domain of balance is the set of nonnegative integers
• Domain of branch_name is the set of all valid branch names in the bank
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Tuples and Attributes
• Each row is called a tuple acct_id branch_name balance
• A fixed-size, ordered set of name-value pairs A-301 New York 350
• A tuple variable can refer to any valid tuple in a A-307 Seattle 275
relation A-318 Los Angeles 550
• Each attribute in the tuple has a unique name ... ... ...
• Can also refer to attributes by index account
• Attribute 1 is the first attribute, etc.
• Example:
• Let tuple variable t refer to first tuple in account relation
• t[balance] = 350
• t[2] = “New York”
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Tuples and Relations
• A relation is a set of tuples
• Each tuple appears exactly once
• Note: SQL tables are multisets! (Sometimes called bags.)
• If two tuples t1 and t2 have the same values for all attributes, then t1 and t2 are
the same tuple (i.e. t1 = t2)
• The order of tuples in a relation is not relevant
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Example of a Relation
attributes
(or columns)
tuples
(or rows)
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Attribute Types
• The set of allowed values for each attribute is called the domain of the
attribute
• Attribute values are (normally) required to be atomic; that is,
indivisible
• The special value null is a member of every domain
• Indicated that the value is “unknown”
• The null value causes complications in the definition of many
operations
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Relation Schemas
• Every relation has a schema
• Specifies the type information for relations
• Multiple relations can have the same schema
• A relation schema includes:
• an ordered set of attributes
• the domain of each attribute
• Naming conventions:
• Relation names are written as all lowercase
• Relation schema’s name is capitalized
• For a relation r and relation schema R:
• Write r(R) to indicate that the schema of r is R
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Relation Schema and Instance
• A1, A2, …, An are attributes
• R = (A1, A2, …, An ) is a relation schema
Example:
instructor = (ID, name, dept_name, salary)
• Formally, given sets D1, D2, …. Dn a relation r is a subset of
D1 x D2 x … x Dn
Thus, a relation is a set of n-tuples (a1, a2, …, an) where each ai Di
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Relation Schema and Instance (cont.)
• The current values (relation instance) of a relation are specified by a
table
• An element t of r is a tuple, represented by a row in a table
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Schema of account Relation
• The relation schema of account is: acct_id branch_name balance
Account_schema = (acct_id, branch_name, balance) A-301 New York 350
A-307 Seattle 275
• To indicate that account has schema A-318 Los Angeles 550
Account_schema:
... ... ...
account(Account_schema)
account
• Important note:
• Domains are not stated explicitly in this notation!
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Relation Schemas
• Relation schemas are ordered sets of attributes
• Can use set operations on them
• Examples:
Relations r(R) and s(S)
• Relation r has schema R
• Relation s has schema S
R∩S
• The set of attributes that R and S have in common
R–S
• The set of attributes in R that are not also in S
• (and, the attributes are in the same order as R)
K⊆R
• K is some subset of the attributes in relation schema R
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Attribute Domains
• The relational model constrains attribute domains to be atomic
• Values are indivisible units
• Mainly a simplification
• Virtually all relational database systems provide non-atomic data types
• Attribute domains may also include the null value
• null = the value is unknown or unspecified
• null can often complicate things. Generally considered good practice to avoid
wherever reasonable to do so.
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Relations and Relation Variables
More formally: acct_id branch_name balance
• account is a relation variable A-301 New York 350
A-307 Seattle 275
• A name associated with a specific schema, and a set of
tuples that satisfies that schema A-318 Los Angeles 550
• (sometimes abbreviated “relvar”) ... ... ...
• A specific set of tuples with the same schema is called account
a relation value (sometimes abbreviated “relval”)
• (Formally, this can also be called a relation)
• Can be associated with a relation variable
• Or, can be generated by applying relational operations to
one or more relation variables
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Relations and Relation Variables (2)
• Problem: acct_id branch_name balance
• The term “relation” is often used in slightly different A-301 New York 350
ways A-307 Seattle 275
A-318 Los Angeles 550
• “Relation” usually means the collection of tuples ... ... ...
• i.e. “relation” usually means “relation value”
account
• It is often used less formally to refer to a relation
variable and its associated relation value
• e.g. “the account relation” is really a relation variable
that holds a specific relation value
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Relations are Unordered
• Order of tuples is irrelevant (tuples
may be stored in an arbitrary order)
• Example: instructor relation with
unordered tuples
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Distinguishing Tuples
• Relations are sets of tuples…
• No two tuples can have the same values for all attributes…
• But, some tuples might have the same values for some attributes
• Example: acct_id branch_name balance
• Some accounts have the same balance A-301 New York 350
• Some accounts are at the same branch A-307 Seattle 275
A-318 Los Angeles 550
A-319 New York 80
A-322 Los Angeles 275
account
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Keys
• Keys are used to distinguish individual tuples
• A superkey is a set of attributes that uniquely
identifies tuples in a relation
• Example: acct_id branch_name balance
{acct_id} is a superkey
A-301 New York 350
• Is {acct_id, balance} a superkey? A-307 Seattle 275
• Yes! Every tuple will have a unique set of values for A-318 Los Angeles 550
this combination of attributes.
A-319 New York 80
• Is {branch_name} a superkey? A-322 Los Angeles 275
• No. Each branch can have multiple accounts account
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Superkeys and Candidate Keys
• A superkey is a set of attributes that uniquely identifies tuples in a
relation
• Adding attributes to a superkey produces another superkey
• If {acct_id} is a superkey, so is {acct_id, balance}
• If a set of attributes K ⊆ R is a superkey, so is any superset of K
• Not all superkeys are equally useful…
• A minimal superkey is called a candidate key
• A superkey for which no proper subset is a superkey
• For account, only {acct_id} is a candidate key
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Primary Keys
• A relation might have several candidate keys
• In these cases, one candidate key is chosen as the primary means of
uniquely identifying tuples
• Called a primary key
cust_id branch_name balance
• Example: customer relation 23-652 Joe Smith 330-25-8822
• Candidate keys could be: 15-202 Ellen Jones 221-30-6551
{cust_id} 23-521 Dave Johnson 005-81-2568
{cust_ssn}
... ... ...
• Choose primary key:
customer
{cust_id}
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Primary Keys (2)
• Keys are a property of the relation schema, not individual tuples
• Applies to all tuples in the relation
• Primary key attributes are listed first in relation schema, and are
underlined
• Examples:
Account_schema = (acct_id, branch_name, balance)
Customer_schema = (cust_id, cust_name, cust_ssn)
• Only indicate primary keys in this notation
• Other candidate keys are not specified
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Primary Keys (3)
• Multiple records cannot have the same cust_id branch_name balance
values for a primary key! 23-652 Joe Smith 330-25-8822
15-202 Ellen Jones 221-30-6551
• …or any candidate key, for that matter…
23-521 Dave Johnson 005-81-2568
• Example: 15-202 Albert Stevens 450-22-5869
customer(cust_id, cust_name, cust_ssn) ... ... . . . relation
The customer
• Two customers cannot have the same ID. customer
• This is an example of an invalid relation
• The set of tuples doesn’t satisfy the required
constraints
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Keys Constrain Relations
• Primary keys constrain the set of tuples that can appear in a relation
• Same is true for all superkeys
• For a relation r with schema R
• If K ⊆ R is a superkey then
⟨ ∀t1, t2 ∈ r(R) : t1[K] = t2[K] : t1[R] = t2[R] ⟩
• i.e., if two tuple-variables have the same values for the superkey attributes,
then they refer to the same tuple
• t1[R] = t2[R] is equivalent to saying t1 = t2
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Choosing Candidate Keys
• Since candidate keys constrain the tuples that can be stored in a
relation…
• Attributes that would make good (or bad) candidate keys depend on what is
being modeled
• Example: customer name as candidate key?
• Very likely that multiple people will have same name
• Thus, not a good idea to use it as a candidate key
• These constraints motivated by external requirements
• Need to understand what we are modeling in the database
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Super Key vs. Candidate Key vs. Primary Key
Super Key
• Definition: A super key is an attribute or a set of attributes that can uniquely
identify a record in a database table. A super key may contain additional
attributes that are not necessary for unique identification.
• Uniqueness: Super keys guarantee that no two rows have the same set of
values in the super key columns. However, they are not minimal, meaning
a super key can have extra attributes that aren’t needed to maintain
uniqueness.
• Multiplicity: A table can have multiple super keys, varying in the number of
attributes they include, as long as the combination ensures uniqueness.
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Super Key vs. Candidate Key vs. Primary Key
Candidate Key
• Definition: A candidate key is a minimal super key, meaning it is a super
key with no possible reduction in the number of columns that can still
uniquely identify a record. Essentially, it is the minimal subset of a super
key.
• Uniqueness and Non-nullability: Each candidate key must contain unique
values and cannot contain null values, ensuring that every record within
the table can be uniquely identified by this key.
• Multiplicity: A table can have multiple candidate keys if there are multiple
minimal sets of fields that can uniquely identify a record in the table.
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Super Key vs. Candidate Key vs. Primary Key
Primary Key
• Definition: The primary key is a specific candidate key selected by the
database designer to uniquely identify records in a table. It is the most
important key and is used to reference other tables (foreign keys).
• Uniqueness and Non-nullability: The primary key must consist of unique
values and cannot contain null values. It enforces entity integrity by
uniquely identifying each record in a table.
• Singular Presence: A table can have only one primary key. This can be a
single column (simple primary key) or composed of multiple columns
(composite primary key) to ensure uniqueness across records.
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Foreign Keys
• One relation schema can include the attributes of another schema’s
primary key
• Example: depositor relation
• Depositor_schema = (cust_id, acct_id)
• Associates customers with bank accounts
• cust_id and acct_id are both foreign keys
• cust_id references the primary key of customer
• acct_id references the primary key of account
• depositor is the referencing relation
• It refers to the customer and account relations
• customer and account are the referenced relations
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
depositor Relation
cust_id cust_name cust_ssn acct_id branch_name balance
23-652 Joe Smith 330-25-8822 A-301 New York 350
15-202 Ellen Jones 221-30-6551 A-307 Seattle 275
23-521 Dave Johnson 005-81-2568 A-318 Los Angeles 550
cust_id acct_id
... ... ... ... ... ...
15-202 A-301
customer account 23-521 A-307
23-652 A-318
• depositor relation references customer and account ... ...
• Represents relationships between customers and their accounts depositor
• Example: Joe Smith’s accounts
• “Joe Smith” has an account at the “Los Angeles” branch, with a balance of 550.
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Foreign Key Constraints
• Tuples in depositor relation specify values for cust_id
• customer relation must contain a tuple corresponding to each cust_id value in
depositor
• Same is true for acct_id values and account relation
• Valid tuples in a relation are also constrained by foreign key
references
• Called a foreign-key constraint
• Consistency between two dependent relations is called referential
integrity
• Every foreign key value must have a corresponding primary key value
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Foreign Key Constraints (2)
• Given a relation r(R)
• A set of attributes K ⊆ R is the primary key for R
• Another relation s(S) references r
• K ⊆ S too
• ⟨ ∀ts ∈ s : ∃tr ∈ r : ts[K] = tr[K] ⟩
• Notes:
• K is not required to be a candidate key for S, only R
• K may also be part of a larger candidate key for S
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Primary Key of depositor Relation?
• Depositor_schema = (cust_id, acct_id) cust_id acct_id
• If {cust_id} is the primary key: 15-202 A-301
• A customer can only have one account 23-521 A-307
• Each customer’s ID can appear only once in depositor
• An account could be owned by multiple customers 23-652 A-318
• If {acct_id} is the primary key: ... ...
• Each account can be owned by only one customer depositor
• Each account ID can appear only once in depositor
• Customers could own multiple accounts
• If {cust_id, acct_id} is the primary key:
• Customers can own multiple accounts
• Accounts can be owned by multiple customers
• Last option is how most banks really work
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Conceptual Data Model (CDM)
• Model the logical structure of the entire data application, independent of
software or data structure model considerations
• A valid CDM can be converted into a PDM
• In its application, CDM can be equated with ERD
• whose function is indeed the same, namely modeling the logical structure of the
database
• CDM is used to describe in detail the logical structure of the database
• CDM consists of objects that are not directly implemented into the actual
database
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Steps to Create a CDM
• Understand the core of the problem given
• Determine which entities are involved
• Determine the data attributes for each entity along with their data types
• Identify the relationships/associations between each entity including their
cardinalities
• Model the Entities and Relationships
• Verify the accuracy of the model
• Correct any errors and warnings
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Physical Data Model (PDM)
• A physical representation of the database to be created, taking into
account the DBMS to be used
• PDM can be generated from a valid CDM
• In its application, PDM can be equated with the Relation Schema
whose function is to model the physical structure of a database
• It is a detailed representation of a database in physical form
• PDM shows the correct data storage structure in the actual database
used
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Schema Diagram for University Database
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Relational Query Languages
• Procedural vs non-procedural, or declarative
• “Pure” languages:
• Relational algebra
• Tuple relational calculus
• Domain relational calculus
• The above 3 pure languages are equivalent in computing power
• We will concentrate in this chapter on relational algebra
• Not turing-machine equivalent
• consists of 6 basic operations
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Select Operation – selection of rows (tuples)
• Relation r
• (A=B) ^ (D > 5) (r)
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Project Operation – Selection of Columns (Attributes)
• Relation r
• A,C (r)
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Union of Two Relations
• Relations r, s
•rs
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Set Difference of Two Relations
• Relations r, s
•r –s
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Joining Two Relations – Cartesian-product
• Relations r, s
•rxs
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Cartesian-product – Naming Issue
• Relations r, s
•rxs
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Renaming a Table
• Allows us to refer to a relation, (say E) by more than one name
x (E)
returns the expression E under the name X
• Relations r
• r x s (r) r.A r.B s.A s.B
α 1 α 1
α 1 β 2
β 2 α 1
β 2 β 2
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Composition of Operations
• Can build expressions using multiple operations
Example: A=C (r x s)
rxs
A=C (r x s)
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Joining two relations – Natural Join
• Let r and s be relations on schemas R and S respectively.
Then, the “natural join” of relations R and S is a relation on schema R S
obtained as follows:
• Consider each pair of tuples tr from r and ts from s.
• If tr and ts have the same value on each of the attributes in R S, add a tuple t
to the result, where
• t has the same value as tr on r
• t has the same value as ts on s
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Natural Join Example
• Relations r, s
• Natural Join
r s
A, r.B, C, r.D, E ( r.B = s.B ˄ r.D = s.D (r x s)))
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Notes about Relational Languages
• Each Query input is a table (or set of tables)
• Each query output is a table.
• All data in the output table appears in one of the input tables
• Relational Algebra is not Turning complete
• Can we compute:
• SUM
• AVG
• MAX
• MIN
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Symbol (Name) Example of Use
σ
(Selection) σ salary > = 85000 (instructor)
Return rows of the input relation that satisfy the predicate.
Π
(Projection) Π ID, salary (instructor)
Output specified attributes from all rows of the input relation. Remove
Summary of Relational
duplicate tuples from the output.
x
instructor x department
Algebra Operators
(Cartesian Product)
Output pairs of rows from the two input relations that have the same value on
all attributes that have the same name.
∪
(Union) Π name (instructor) ∪ Π name (student)
Output the union of tuples from the two input relations.
-
(Set Difference) Π name (instructor) -- Π name (student)
Output the set difference of tuples from the two input relations.
⋈
(Natural Join) instructor ⋈ department
Output pairs of rows from the two input relations that have the same value on
all attributes that have the same name.
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Thank you!
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia