Tables in SQL Attribute names
Table name
Product
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Tuples or rows
Tables Explained
● The schema of a table is the table name and its
attributes:
Product(PName, Price, Category, Manufacturer)
● A key is an attribute whose values are unique;
we underline a key
Product(PName, Price, Category, Manufacturer)
Tables Explained
● A tuple = a record
○ Restriction: all attributes are of atomic type
● A table = a set of tuples
○ Like a list…
○ …but it is unordered:
no first(), no next(), no last().
Data Types in SQL
● Atomic types:
○ Characters: CHAR(20), VARCHAR(50)
○ Numbers: INT, BIGINT, SMALLINT, FLOAT
○ Others: MONEY, DATETIME, …
● Every attribute must have an atomic type
SQL Query
Basic form: (plus many many more bells and whistles)
SELECT <attributes>
FROM <one or more relations>
WHERE <conditions>
Simple SQL Query
Product PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
SELECT *
FROM Product
WHERE category=‘Gadgets’
“selection” PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
Simple SQL Query
Product PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
SELECT PName, Price, Manufacturer
FROM Product
WHERE Price > 100
PName Price Manufacturer
“selection” and SingleTouch $149.99 Canon
“projection” MultiTouch $203.99 Hitachi
Notation Input
Schema
Product(PName, Price, Category, Manufacturer)
SELECT PName, Price, Manufacturer
FROM Product
WHERE Price > 100
Answer(PName, Price, Manufacturer)
Output
Schema
Details
● Case insensitive:
○ Same: SELECT Select select
○ Same: Product product
○ Different: ‘Seattle’ ‘seattle’
● Constants:
○ 'abc’ - yes
○ "abc” - no
The LIKE operator
SELECT *
FROM Product
WHERE PName LIKE ‘%gizmo%’
● s LIKE p: pattern matching on strings
● p may contain two special symbols:
○ % = any sequence of characters
○ _ = any single character
Eliminating Duplicates
Category
SELECT DISTINCT category Gadgets
FROM Product
Photography
Household
Compare to: Category
Gadgets
SELECT category Gadgets
FROM Product
Photography
Household
Ordering the Results
SELECT pname, price, manufacturer
FROM Product
WHERE price > 10
ORDER BY manufacturer DESC, pname
Ties are broken by the second attribute
on the ORDER BY list, etc.
Ordering is ascending, unless you
specify the DESC keyword.
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
SELECT DISTINCT category
?
FROM Product
ORDER BY category
?
SELECT Category
FROM Product
ORDER BY PName
?
SELECT DISTINCT category
FROM Product
ORDER BY PName
Keys and Foreign Keys
Company
CName StockPrice Country
Key GizmoWorks 25 USA
Canon 65 Japan
Hitachi 15 Japan
Product
PName Price Category Manufacturer
Foreign
Gizmo $19.99 Gadgets GizmoWorks key
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Joins
● Introduction to Joins
● What are joins
● Why join matters?
Joins Company
CName StockPrice Country
Product (pname, price, category, manufacturer)
Company (cname, stockPrice, country) GizmoWorks 25 USA
Canon 65 Japan
Find all products under $200 manufactured in
Japan; return their names and prices. Hitachi 15 Japan
Product
SELECT PName, Price
PName Price Category Manufacturer
FROM Product, Company
WHERE Manufacturer=CName Gizmo $19.99 Gadgets GizmoWorks
AND Country=‘Japan’ Powergizmo $29.99 Gadgets GizmoWorks
AND Price <= 200
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Joins
Product Company
PName Price Category Manufacturer Cname StockPrice Country
Gizmo $19.99 Gadgets GizmoWorks
GizmoWorks 25 USA
Powergizmo $29.99 Gadgets GizmoWorks Canon 65 Japan
SingleTouch $149.99 Photography Canon
Hitachi 15 Japan
MultiTouch $203.99 Household Hitachi
SELECT PName, Price
FROM Product, Company PName Price
WHERE Manufacturer=CName AND Country=‘Japan’ SingleTouch $149.99
AND Price <= 200
A Subtlety about Joins Company
CName StockPrice Country
Product (pname, price, category, manufacturer)
Company (cname, stockPrice, country) GizmoWorks 25 USA
Find all countries that manufacture some product in Canon 65 Japan
the ‘Gadgets’ category.
Hitachi 15 Japan
Product
PName Price Category Manufacturer
SELECT Country
FROM Product, Company Gizmo $19.99 Gadgets GizmoWorks
WHERE Manufacturer=CName AND
Category=‘Gadgets’ Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Unexpected duplicates
A Subtlety about Joins
Product Company
Name Price Category Manufacturer Cname StockPrice Country
Gizmo $19.99 Gadgets GizmoWorks
GizmoWorks 25 USA
Powergizmo $29.99 Gadgets GizmoWorks Canon 65 Japan
SingleTouch $149.99 Photography Canon
Hitachi 15 Japan
MultiTouch $203.99 Household Hitachi
SELECT Country
FROM Product, Company
WHERE Manufacturer=CName AND
Category=‘Gadgets’
Country
What is ??
the problem ? ??
What’s the
solution ?
Join
Aggregation
SELECT avg(price) SELECT count(*)
FROM Product FROM Product
WHERE maker=“Toyota” WHERE year > 1995
SQL supports several aggregation operations:
sum, count, min, max, avg
Except count, all aggregations apply to a
single attribute
Aggregation: Count
COUNT applies to duplicates, unless otherwise stated:
SELECT Count(category) same as Count(*)
FROM Product
WHERE year > 1995
We probably want:
SELECT Count(DISTINCT category)
FROM Product
WHERE year > 1995
Simple Aggregations
Purchase
Product Date Price Quantity
Bagel 10/21 1 20
Banana 10/3 0.5 10
Banana 10/10 1 10
Bagel 10/25 1.50 20
SELECT Sum(price * quantity)
FROM Purchase ?
WHERE product = ‘bagel’
Simple Aggregations
Purchase
Product Date Price Quantity
Bagel 10/21 1 20
Banana 10/3 0.5 10
Banana 10/10 1 10
Bagel 10/25 1.50 20
SELECT Sum(price * quantity)
FROM Purchase
WHERE product = ‘bagel’ 50 (= 20+30)
Grouping and Aggregation
Purchase(product, date, price, quantity)
Product Date Price Quantity
Find total sales after 10/1/2005 per product. Bagel 10/21 1 20
Bagel 10/25 1.50 20
Banana 10/3 0.5 10
SELECT product, Sum(price*quantity) AS TotalSales Banana 10/10 1 10
FROM Purchase
WHERE date > ‘10/1/2005’
GROUP BY product
Let’s see what this means…
Grouping and Aggregation
Product Date Price Quantity Product TotalSales
Bagel 10/21 1 20
Bagel 10/25 1.50 20 Bagel 50
Banana 10/3 0.5 10
Banana 15
Banana 10/10 1 10
SELECT product, Sum(price*quantity) AS TotalSales
FROM Purchase
WHERE date > ‘10/1/2005’
GROUP BY product
HAVING Clause
Same query, except that we consider only products that had
at least 30 buyers.
SELECT product, Sum(price * quantity)
FROM Purchase
WHERE date > ‘10/1/2005’
GROUP BY product
HAVING Sum(quantity) > 30
HAVING clause contains conditions on aggregates.
NULLS in SQL
● Whenever we don’t have a value, we can put
a NULL
● Can mean many things:
○ Value does not exists
○ Value exists but is unknown
○ Value not applicable
○ Etc.
● The schema specifies for each attribute if can
be null (nullable attribute) or not
● How does SQL cope with tables that have
NULLs ?
Null Values
● If x= NULL then 4*(3-x)/7 is still NULL
● If x= NULL then x=“Joe” is UNKNOWN
● In SQL there are three boolean values:
FALSE = 0
UNKNOWN = 0.5
TRUE = 1
Null Values
Can test for NULL explicitly:
○ x IS NULL
○ x IS NOT NULL
SELECT *
FROM Person
WHERE age < 25 OR age >= 25 OR age IS NULL
Now it includes all Persons
Modifying the Database
Three kinds of modifications
● Insertions
● Deletions
● Updates
Sometimes they are all called “updates”
Insertions
General form:
INSERT INTO R(A1,…., An) VALUES (v1,…., vn)
Example: Insert a new purchase to the database:
INSERT INTO Purchase(buyer, seller, product, store)
VALUES (‘Joe’, ‘Fred’, ‘wakeup-clock-espresso-machine’,
‘The Sharper Image’)
Missing attribute → NULL.
May drop attribute names if give them in order.
Deletions
Example:
DELETE FROM PURCHASE
WHERE seller = ‘Joe’ AND
product = ‘Brooklyn Bridge’
Factoid about SQL: there is no way to delete only a single
occurrence of a tuple that appears twice
in a relation.
Updates
Example:
UPDATE PRODUCT
SET price = price/2
WHERE Product.name IN
(SELECT product
FROM Purchase
WHERE Date =‘Oct, 25, 1999’);
Background: Semi Structured Data
● XML and JSON are two standard, textual data formats for
representing arbitrary data
○ XML stands for “eXtensible Markup Language”
○ JSON stands for “JavaScript Object Notation”
● Both are commonly used in practice. XML came first
● JSON, which uses JavaScript syntax, became popular for
representing data in web applications and services
○ If you’re using JavaScript, JSON is an obvious choice
Background: Semi Structured Data
● Both formats are reasonable
choices, although some people
have strong biases
● Most programming languages
have libraries for parsing and
generating both XML and JSON
● You should be familiar with both
Introduction to XML
● XML: Extensible Markup Language
● Defined by the WWW Consortium (W3C)
● Documents have tags giving extra
information about sections of the document
○ E.g. <title> XML </title> <slide> Introduction …</slide>
● Extensible, unlike HTML
○ Users can add new tags
Introduction to XML
● The ability to specify new tags,
and to create nested tag
structures make XML a great way
to exchange data, not just
documents.
○ Much of the use of XML has been
in data exchange applications,
not as a replacement for HTML
Introduction to XML
Tags make data (relatively) self-documenting
○ E.g.
<?xml version = "1.0"?>
<bank>
<account>
<account_number> A-101 </account_number>
<branch_name> Downtown </branch_name>
<balance> 500 </balance>
</account>
<depositor>
<account_number> A-101 </account_number>
<customer_name> Johnson </customer_name>
</depositor>
</bank>
Introduction to XML
Data interchange is critical in today’s networked
world
○ Examples:
■ Banking: funds transfer
■ Order processing (especially inter-
company orders)
■ Scientific data
● Chemistry, Genetics
○ Paper flow of information between
organizations is being replaced by
electronic flow of information
Comparison with Structured Data
● Inefficient: tags, which in effect represent
schema information, are repeated
● Better than relational tuples as a data-exchange
format
○ Unlike relational tuples, XML data is self-
documenting due to presence of tags
○ Non-rigid format: tags can be added
○ Allows nested structures
○ Wide acceptance, not only in database
systems, but also in browsers, tools, and
applications
Structure of XML Data
● Tag: label for a section of data
● Element: section of data beginning with
<tagname> and ending with matching
</tagname>
Structure of XML Data
● Elements must be properly nested
○ Proper nesting
■ <account> … <balance> …. </balance> </account>
○ Improper nesting
■ <account> … <balance> …. </account> </balance>
○ Formally: every start tag must have a unique matching
end tag, that is in the context of the same parent
element.
● Every document must have a single top-level element
Example of Nested Elements
<?xml version = "1.0"?>
<bank-1>
<customer>
<customer_name> Hayes </customer_name>
<customer_street> Main </customer_street>
<customer_city> Harrison </customer_city>
<account>
<account_number> A-102 </account_number>
<branch_name> Perryridge </branch_name>
<balance> 400 </balance>
</account>
<account>
…
</account>
</customer>
.
.
</bank-1>
Structure of XML Data
Mixture of text with sub-elements is legal in XML.
○ Example:
<account>
This account is seldom used any more.
<account_number> A-102</account_number>
<branch_name> Perryridge</branch_name>
<balance>400 </balance>
</account>
○ Useful for document markup, but discouraged for
data representation
Attributes
● Elements can have attributes
<account acct-type = “checking” >
<account_number> A-102 </account_number>
<branch_name> Perryridge </branch_name>
<balance> 400 </balance>
</account>
● Attributes are specified by name=value pairs inside the starting tag of
an element
● An element may have several attributes, but each attribute name can
only occur once
<account acct-type = “checking” monthly-fee=“5”>
More on XML Syntax
Distinction between subelement
and attribute
In the context of documents,
attributes are part of markup, while
subelement contents are part of the
basic document contents
More on XML Syntax
Distinction between subelement and attribute
○In the context of data representation, the difference is unclear
and may be confusing
■Same information can be represented in two ways
<account account_number = “A-101”> …. </account>
<account>
<account_number>A-101</account_number> …
</account>
○Suggestion: use attributes for identifiers of elements, and use
subelements for contents
More on XML Syntax
Elements without subelements or text content can be abbreviated
by ending the start tag with a /> and deleting the end tag
○ <account number=“A-101” branch=“Perryridge” balance=“200 />
To store string data that may contain tags, without the tags being
interpreted as subelements, use CDATA as below
○ <![CDATA[<account> … </account>]]>
Here, <account> and </account> are treated as just strings
CDATA stands for “character data”, text that will NOT be parsed by a parser
JSON Data – A name and a value
• A name/value pair consists of a field name (in double quotes), followed by a
colon, followed by a value
• Unordered sets of name/value pairs
• Begins with { (left brace)
• Ends with } (right brace)
• Each name is followed by : (colon)
• Name/value pairs are separated by , (comma)
{
"employee_id": 1234567,
"name": "Jeff Fox",
"hire_date": "1/1/2013",
"location": "Norwalk, CT",
"consultant": false
}
JSON Data – A name and a value
• In JSON, values must be one of the following data types:
• a string
• a number
• an object (JSON object)
• an array
• a boolean
• null
{
"employee_id": 1234567,
"name": "Jeff Fox",
"hire_date": "1/1/2013",
"location": "Norwalk, CT",
"consultant": false
}
JSON Data – A name and a value
• Strings in JSON must be written in double quotes.
{ "name":"John" }
• Numbers in JSON must be an integer or a floating point.
{ "age":30 }
• Values in JSON can be objects.
{
"employee":{ "name":"John", "age":30, "city":"New York" }
}
• Values in JSON can be arrays.
{
"employees":[ "John", "Anna", "Peter" ]
}
Another Example: XML vs. JSON
<?xml version="1.0"?>
<employees>
<employee>
<firstName>John</firstName> <lastName>Doe</lastName>
</employee>
<employee>
<firstName>Anna</firstName> <lastName>Smith</lastName>
</employee>
<employee>
<firstName>Peter</firstName> <lastName>Jones</lastName>
</employee>
</employees>
{"employees":[
{ "firstName":"John", "lastName":"Doe" },
{ "firstName":"Anna", "lastName":"Smith" },
{ "firstName":"Peter", "lastName":"Jones" }
]}
XML vs. JSON
•JSON is Like XML Because The biggest difference is: XML
• Both JSON and XML are "self describing" has to be parsed with XML parser. JSON
(human readable) can be parsed by standard JavaScript
• Both JSON and XML are hierarchical function.
(values within values)
• Both JSON and XML can be parsed and
used by lots of programming languages
•JSON is Unlike XML Because
• JSON doesn't use end tag
• JSON is shorter
• JSON is quicker to read and write
• JSON can use arrays
• JSON has a better fit for OO systems than
XML