DBMS
WEEK 2
LEC 1: Introduction to Relational Model (Pt.1)
• Attribute
o Attribute Types:
▪ Consider
Students = Roll #, First Name, Last Name, DoB, Passport #, Aadhaar #,
Department relation
▪ The set of allowed values for each attribute is called the domain of
the attribute
• Roll #: Alphanumeric string
• First Name, Last Name: Alpha String
• DoB: Date
• Passport #: String (Letter followed by 7 digits) – nullable
(optional)
• Aadhaar #: 12-digit number
• Department: Alpha String
▪ Attribute values are (normally) required to be atomic; that is
indivisible
▪ The special value null is a member of every domain. This indicates
that the value is unknown
▪ The null value may cause complications in the definition of many
operations
• Schema and instance
o Relation Schema and Instance
▪ A1, A2, …, An are attributes
▪ R = (A1, A2, …, An) is a relation schema. Example: instructor = (ID, name,
dept_name, salary)
Formally, given sets D1, D2, …Dn a relation r is a subset of D1 x D2 x … x Dn . Thus, a relation is a
set of n-tuples (a1, a2, …, an) where each ai ∈ Di
▪ The current values (relation instance) of a relation are specified by a
table
▪ An element t of r is a tuple, represented by a row in a table
Example: instructor ≡ (String (5) x String x String x Number+), where ID ∈ String, name ∈
String, dept_name ∈ String, and salary ∈ Number+
▪ Relations are unordered with unique tuples
• Order of tuples/rows is irrelevant (tuples may be stored in
arbitrary order)
• No two tuples/rows may be identical
• Keys
o Let K ⊆ R, where R is the set of attributes in the relation
o K is the superkey of R if values for K are sufficient to identify a unique tuple
of each possible relation r(R)
▪ Example: {ID} and {ID, name} are both superkeys of instructor
o Superkey K is a candidate key if K is minimal
▪ Example: {ID} is a candidate key for instructor
o One of the candidate keys is selected to be the primary key
o A surrogate key (or synthetic key) is a database is a unique identifier for
either an entity in modeled world or an object in the database
▪ Not derived from application data, unlike a natural (or business) key
which is derived from application data
o Example:
▪ Students = Roll #, First Name, Last Name, DoB, Passport #, Aadhaar #,
Department
▪ Super Key: Roll #, {Roll #, DoB}
▪ Candidate Keys: Roll #, {First Name, Last Name}, Aadhaar #
• Passport # cannot be a key as it is nullable
▪ Primary Key: Roll #
• Can Aadhaar # be a primary key? It may suffice for unique
identification. But Roll # may have additional useful info
▪ Secondary Key/ Alternate Key: {First Name, Last Name}, Aadhar #
▪ Simple Key: Consists of a single attribute.
▪ Composite Key: {First Name, Last Name}
• Consists of more than one attribute to uniquely identify an
entity occurrence
• One or more of the attributes, which make up the key are not
simple keys in their own right
▪ Foreign Key constraint: Value in one relation must appear in another
• Referencing relation
o Enrolment: Foreign keys – Roll #, Course #
• Referenced relation
o Students, Courses
▪ A compound key consists of more than one attribute to uniquely
identify an entity occurrence
• Each attribute, which makes up the key, is a simple key in its
own right
• {Roll #, Course #}
• Schema Diagram for university database
• Relational Query Languages
o Procedural viz-a-viz Non-procedural or Declarative paradigms
▪ Procedural programming requires that the programmer tell the
computer what to do
• That is, how to get the output for the range of required inputs
• The programmer must know an appropriate algorithm
▪ Declarative programming requires a more descriptive style
• The programmer must know what relationships hold between
various entities
o “Pure” languages:
▪ Relational algebra
▪ Tuple relational calculus
▪ Domain relational calculus
o All 3 pure languages are equivalent in computing power
LEC 2: Introduction to Relational Model (Pt.2)
• Relational Operators
o Basic Properties of Relations
▪ A relation is set. Hence,
• Ordering of rows/tuples is inconsequential
• All rows/tuples must be distinct
o Select Operation – selection of rows (tuples)
▪ Relation r
• σ A=B ∧ D>5 (r)
o Project Operation – selection of columns (Attributes)
▪ Relation r
▪ π A, C (r)
o Union of two relations
▪ Relation r, s
▪ r∪s
o Set difference of two relations
▪ Relation r, s
▪ r–s
o Set intersection of two relations
▪ Relation r, s
▪ r∩s
▪ r ∩ s = r – (r – s)
o Joining 2 relations – Cartesian Product
▪ Relation r, s
▪ rxs
▪ r = πR (r x s) ; s = πS (r x s) where R = (A, B) and S = (C, D, E)
▪ Naming issue: Between the 2 relations, there are one or more
columns with the same name
• Name them r.B and s.B (B is the repeated column)
• rxs
o Renaming a table
▪ Allows us to refer to a relation (say E) by more than one name ρX (E)
returns the expression E under the name X
▪ Relation r
▪ r x ρs (r)
o Composition of operations
▪ Can build expressions using multiple operations
▪ E.g.: σA=C (r x s)
• rxs
• σA=C (r x s)
o Joining two relations – Natural join
▪ Let r and s be relations on schemas R and S respectively. Then, the
natural join of relations R and S is a relation on schema R U S
obtained as follows:
• Consider each pair of tuples tr from r and ts from s
• If tr and ts have the same value on each of the attributes in R
∩ S, add a tuple t to the result, where
o t has the same value as tr on r
o t has the same value as ts on s
▪ Example:
• Relations r, s:
• Natural Join: r ⋈ s
ΠA,r.B,C,r.D,E (σr.B=s.B ∧ r.D=s.D (r x s))
• Aggregate Operators
o Can we compute:
▪ SUM
• E.g.: SUMB (σB > 2(r))
▪ AVG
▪ MAX
▪ MIN
• Notes about Relational Languages
o Each query input is a table (or set of tables)
o Each query output is a table
o All data in the output table appears in one of the input tables
o Relational algebra is not Turing complete (a system is called Turing complete
if this system is able to recognize or decide other data-manipulation rule
sets)
• Summary of Relational Operators
LEC 3: Introduction to SQL (Pt.1)
• History of Query Language
o IBM developed Structured English Query Language (SEQUEL) as a part of
System R project. Renamed Structured Query Language (SQL)
o There aren’t any alternatives to SQL for speaking to relational databases
(that is, SQL as a protocol), but there are many alternatives to writing SQL in
the applications
o These alternatives have been implemented in the form of frontends for
working with relational databases. Examples:
▪ SchemeQL and CLSQL, which are probably the most flexible, owing to
their Lisp heritage, they look a lot more like SQL than other frontends
▪ LINQ (in .Net)
▪ ScalaQL and ScalaQuery (in Scala)
▪ SqlStatement, ActiveRecord and many others in Ruby
▪ HaskellDB
▪ …
o There are several query languages that are derived or inspired from SQL. Of
them most effective is SPARQL (SPARQL Protocol and RDF Query Language)
▪ RDL – Resource Description Language
▪ Standardized by W3C Consortium as key technology of semantic web
▪ Versions:
• SPARQL 1.0
• SPARQL 1.1
▪ Used as query languages for several NoSQL systems – particularly the
Graph Databases that use RDF as store
• Data Definition Language (DDL)
o SQL DDL allows the specification of info about relations, including:
▪ The Schema for each relation
▪ The Domain of values associated with each attribute
▪ Integrity Constraints
▪ Also,
• The set of Indices to be maintained for each relation
• Security and Authorization information for each relation
• The Physical Storage Structure of each relation on disk
o Domain Types in SQL
▪ char(n) – Fixed length character string, with user-specified length n
▪ varchar(n) – Variable length character strings, with user-specified
maximum length n
▪ int – Integer (a finite subset of the integers that is machine-
independent)
▪ smallint(n) – Small integer (a machine independent subset of the
integer domain type)
▪ numeric(p,d) – Fixed point number, with user-specified precision of p
digits, with d digits to the right of decimal point. (ex., numeric (3,1)
allows 44.5 to be stored exactly, but not 444.5 or 0.32)
▪ real, double precision – Floating point and double-precision floating
point numbers, with machine-dependent precision
▪ float(n) – Floating point number, with user-specified precision of at
least n digits
o Create Table Construct
▪ An SQL relation is defined using the create table command:
create table r (A1D1, A2D2, …, AnDn),
(integrity-constraint1),
…
(integrity-constraintk));
• r is the name of the relation
• each Ai is an attribute name in the schema of relation r
• Di is the data type of values in the domain of attribute Ai
create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2));
• Create table construct – Integrity constraints
o not null
o primary key (A1, …, Am)
o foreign key (Am, …, An) references r
create table instructor (
ID char(5),
name varchar(20) not null,
dept_name varchar(20),
salary numeric(8,2),
primary key (ID),
foreign key (dept_name) references department));
o primary key declaration on an attribute automatically ensures not
null
o Update tables
▪ Insert (DML command)
• insert into instructor values (‘10211’, ‘Smith’, ‘Biology’,
66000);
▪ Delete (DML command)
• Remove all tuples from the student relation
o delete from student
▪ Drop Table (DDL command)
• drop table r
▪ Alter (DDL command)
• alter table r add A D
o Where A is the name of the attribute to be added to
relation r and D is the domain of A
o All existing tuples in the relation are assigned null as
the value for the new attribute
• alter table r drop A
o Where A is the name of an attribute of relation r
o Dropping of attributes not supported by many
databases
• Data Manipulation Language (DML): Query Structure
o Basic Query Structure
▪ A typical SQL query has the form:
select A1, A2, …, An ,
from r1, r2, …, rm
where P
• Ai represents an attribute from ri ‘s
• ri represents a relation
• P is a predicate
The result of an SQL query is a relation
o Select Clause
• The select clause lists the attributes desired in the result of a
query
▪ Corresponds to the projection operation of the relational
algebra
• Example: find the names of all instructors:
select name,
from instructor
• NOTE: SQL names are case insensitive
o Name ≡ NAME ≡ name
• SQL allows duplicates in relations as well as in query results
• To force the elimination of duplicates, insert the keyword distinct
after select
o To find the dept names of all instructors and remove
duplicates:
▪ select distinct dept_name
from instructor
• The keyword all specifies that duplicates should not be removed
o select all dept_name
from instructor
• An asterisk in the select cause denotes all attributes
o select *
from instructor
• An attribute can be a literal with no from clause
o select ‘437’
o Results is a table with one column and a single row with
value ‘437’
o Can give the column a name using:
▪ select ‘A’ as FOO
• An attribute can be a literal with from clause
▪ select ‘A’
from instructor
o Result is a table with one column and N rows (number of
tuples in the instructor table
• The select clause can also contain arithmetic expressions involving
the operation, +, -, * and / and operating on constants and
attributes of tuples
o The query:
▪ select ID, name, salary/12
from instructor
o Would return a relation that is the same as the instructor
relation, except that the value of the attribute salary is
divided by 12 (will be replaced)
o Can rename “salary/12” using the as clause:
▪ select ID, name, salary/12 as monthly_salary
o Where clause
• Specifies conditions that the result must satisfy
▪ Corresponds to the selection predicate of the relational
algebra
• To find all instructors in Comp. Sci. dept
▪ select name
from instructor
where dept_name = ‘Comp. Sci.’
• Comparison results can be combined using the logical connectives
and, or and not
• Comparisons can be applied to results of arithmetic expressions
o From clause
• The from clause lists the relations involved in the query
▪ Corresponds to the Cartesian product operation of the
relational algebra
• Find the Cartesian product instructor X teaches
▪ select *
from instructor, teaches
▪ Generates every possible instructor-teaches pair, with all
attributes from both relations
▪ For common attributes (for example, ID), the attributes in
the resulting table are renamed using the relation’s name
(for example, instructor.ID)
• Cartesian product not very useful directly, but useful combined
with where-clause condition (selection operation in relational
algebra)
LEC 4: Introduction to SQL (Pt.2)
• Additional Basic Operations
o Cartesian Product
o Rename AS Operation
▪ The SQL allows renaming relations and attributes using the as clause:
old_name as new_name
▪ Keyword as is optional and may be omitted
o String Operations
▪ SQL includes a string-matching operator for comparisons on
character strings. The operator like uses patterns that are described
using two special characters:
• percent (%) The % character matches any substring
• underscore ( _ ) The _ character matches any character
▪ Find the names of all instructors whose name includes the substring
“dar”
select name
from instructor
where name like ‘%dar%’
▪ Match the string “100%”
like ‘100%’ escape ‘\’
in that above we use backlash (\) as the escape character
▪ Patterns are case sensitive
▪ Examples:
o ‘Intro%’ matches any string beginning with “Intro”
o '%Comp%’ matches any string containing “Comp” as a
substring
o ‘_ _ _’ matches any string of exactly 3 characters
o ‘_ _ _ %’ matches any string of at least 3 characters
▪ SQL supports a variety of string operations such as
o Concatenation (using “||”)
o Converting from upper to lower case (and vice versa)
o Finding string length, extracting substrings, etc
o Ordering the Display of Tuples
▪ We may specify desc for descending order or asc for ascending order,
for each attribute; ascending order is the default
▪ Can sort on multiple attributes
o Selecting Number of tuples in output
▪ The Select Top clause is used to specify the number of records to
return
▪ The Select Top clause is useful on large tables with thousands of
records. Returning a large number of records can impact
performance
select top 10 distinct name
from instructor
▪ Not all database systems support the SELECT TOP clause
• SQL Server & MS Access support select top
• MySQL supports the limit clause
• Oracle uses fetch first n rows only and rownum
select distinct name
from instructor
order by name
fetch first 10 rows only
o Where Clause Predicates
▪ SQL includes a between comparison operator
▪ Tuple comparison
▪ In Operator
• The in operator allows user to specify multiple values in a
where clause
• The in operator is a shorthand for multiple or conditions
o Duplicates
▪ In relations with duplicates, SQL can define how many copies of
tuples appear in the result
▪ Multiset versions of some of the relational algebra operators – given
multiset relations r1 and r2:
• σθ(r1): If there are c1 copies of tuple t1 in r1 and t1 satisfies
selections σθ, then there are copies of t1 in σθ(r1)
• πA (r): For each copy of tuple t1 in r1, there is a copy of tuple
πA (t1) in πA (r1) where πA (t1) denotes the projection of the
single tuple t1
• r1 x r2: If there are c1 copies of tuple t1 in r1 and c2 copies of
tuple t2 in r2, there are c1 x c2 copies of tuple t1.t2 in r1 x r2
LEC 5: Introduction to SQL (Pt.3)
• Set Operations
o Set operations union, intersect and except
▪ Each of the above operations automatically eliminates duplicates
o To retain all duplicates use the corresponding multiset versions union all,
intersect all and except all
o Suppose a tuple occurs m times in r and n times in s, then it occurs:
▪ m + n times in r union all s
▪ min (m, n) times in r intersect all s
▪ max (0, m - n) times in r except all s
• Null Values
o It is possible for tuples to have a null value, denoted by null, for some of
their attributes
o null signifies an unknown value or that a value doesn’t exist
o The result of any arithmetic expressions involving null is null
o The predicate is null can be used to check for null values
o It is not possible to test for null values with comparison operators, such as =,
<, or <>. We need to use the is null and is not null operators instead
• Three Valued Logic
o Three values – true, false, unknown
o Any comparison with null returns unknown
▪ E.g: 5 < null or null <> null or null = null return unknown
o Three-valued logic using the value unknown:
▪ OR: (unknown or true) = true
(unknown or false) = unknown
(unknown or unknown) = unknown
▪ AND: (true and unknown) = unknown
(false and unknown) = false
(unknown and unknown) = unknown
▪ NOT: (not unknown) = unknown
▪ “P is unknown” evaluates to true if predicate P evaluates to unknown
o Result of where clause predicate is treated as false if it evaluates to
unknown
• Aggregate Functions
o These functions operate on the multiset of values of a column of a relation
and return a value
▪ avg
▪ min
▪ max
▪ sum
▪ count
o Group By
▪ Attributes in select clause outside of aggregate functions must
appear in group by list
• /* erroneous query */
select dept_name, ID, avg(salary)
from instructor
group by dept_name;
o Having clause
▪ Properties are specified using having clause (eg: having avg(salary) >
42000)
▪ Predicates in the having clause are applied after the formation of
groups whereas predicates in the where clause are applied before
forming groups
o Null Values and Aggregates
▪ Total all salaries
select sum(salary)
from instructor;
• Above statement ignores null amounts
• Result in null if there is no non-null amount
▪ All aggregate operations except count(*) ignore tuples with null
values on the aggregated attributes
▪ What if collection has only null values?
• Count returns 0
• All other aggregates return null