Introduction to Set Operations in SQL
Set operations in SQL are techniques for combining or comparing
the results of two or more SELECT statements.
Set operations form the foundation of SQL and enable us to
combine, compare, and filter data from multiple sources. These
operations are indispensable for tasks ranging from data
integration and cleansing to advanced analytics and reporting.
We will try to understand what are set operators and how they
are used in SQL, their practical applications, and more!
Introduction to Set Theory in SQL, Dr. Asoke Nath 1
Introduction to Set Operations in SQL
Q: What Are Set Operations in SQL?
Set operations in SQL are techniques for combining or comparing the results of two or
more SELECT statements. They act like mathematical set operations, letting us find the
union, intersection, or difference between the rows returned by our queries. This makes
them indispensable when analysing data from multiple sources or perspectives.
Here's a quick overview of the core set operations:
1) UNION: Merges all unique rows from two or more SELECT statements, eliminating
duplicates.
2) UNION ALL: Merges all rows from two or more SELECT statements, keeping
duplicates.
3) INTERSECT: Returns only the rows that appear in both SELECT statements.
4) EXCEPT: Returns rows from the first SELECT statement that don't appear in the
second. Introduction to Set Theory in SQL, Dr. Asoke Nath 2
Introduction to Set Operations in SQL
Understanding Set Operators
Set operators are specialized commands or symbols used to perform
operations on the result sets of multiple SELECT queries. They enable us
to perform tasks like finding the union (all rows), intersection (shared
rows), and difference (unique rows) between different datasets.
Set operators are essential tools for data manipulation and analysis within
SQL. They provide a powerful means of querying and processing data
from relational databases.
Introduction to Set Theory in SQL, Dr. Asoke Nath 3
Introduction to Set Operations in SQL
Comparison with relational algebra operations
Relational algebra is a theoretical framework that provides a foundation
for understanding database queries. It offers abstract operations like
projection, selection, and join, rooted in mathematical principles and
independent of specific database systems. Think of it as the "behind-the-
scenes" logic that powers our database interactions.
Set operators in SQL offer a practical implementation of these concepts
within a database environment. They enable us to perform set operations
like union, intersection, and difference directly on the result sets of SQL
queries.
Introduction to Set Theory in SQL, Dr. Asoke Nath 4
Introduction to Set Operations in SQL
Comparison with relational algebra operations(contd..)
While relational algebra provides a formal foundation for database
operations, set operators in SQL offer a standardized, user-friendly
interface for data manipulation tasks.
Understanding the relationship between set operators and relational
algebra operations provides insights into the theoretical underpinnings of
SQL. It enables users to leverage these concepts effectively in database
querying and analysis tasks.
Introduction to Set Theory in SQL, Dr. Asoke Nath 5
Introduction to Set Operations in SQL
Types of Set Operators in SQL
There are three primary set operators in SQL:
UNION
INTERSECT
EXCEPT (or MINUS in some dialects)
These operators mathematically correspond to the concepts of
union, intersection, and set difference .
Introduction to Set Theory in SQL, Dr. Asoke Nath 6
Introduction to Set Operations in SQL
The UNION operator:
UNION operator provides unique values by default. To find
duplicate values, use UNION ALL.
Note: SQL UNION and UNION ALL difference is that UNION
operator removes duplicate rows from results set and
UNION ALL operator retains all rows, including duplicate.
Introduction to Set Theory in SQL, Dr. Asoke Nath 7
Introduction to Set Operations in SQL
The UNION operator:
Examples of SQL UNION
Let's look at an example of UNION operator in SQL to understand it better.
Let's create two tables "Emp1" and "Emp2";
Emp1 Table
Write the following SQL query to create Emp1 table.
CREATE TABLE Emp1(
EmpID INT PRIMARY KEY,
Name VARCHAR(50),
Country VARCHAR(50),
Age int(2),
mob int(10)
);
Introduction to Set Theory in SQL, Dr. Asoke Nath 8
Introduction to Set Operations in SQL
The UNION operator:
-- Insert some sample data into the Customers table
INSERT INTO Emp1 (EmpID, Name,Country, Age,
mob)
VALUES (1, 'Shubham', 'India','23','738479734'),
(2, 'Aman ', 'Australia','21','436789555'),
(3, 'Naveen', 'Sri lanka','24','34873847'),
(4, 'Aditya', 'Austria','21','328440934'),
(5, 'Nishant', 'Spain','22','73248679');
Introduction to Set Theory in SQL, Dr. Asoke Nath 9
Introduction to Set Operations in SQL
The UNION operator:
SELECT* FROM Emp1;
Output:
Emp1 Table
Introduction to Set Theory in SQL, Dr. Asoke Nath 10
Introduction to Set Operations in SQL
The UNION operator:
Emp2 Table
Write the following SQL query to create Emp2 table
CREATE TABLE Emp2(
EmpID INT PRIMARY KEY,
Name VARCHAR(50),
Country VARCHAR(50),
Age int(2),
mob int(10)
);
-- Insert some sample data into the Customers table
INSERT INTO Emp2 (EmpID, Name,Country, Age, mob)
VALUES (1, 'Tommy', 'England','23','738985734'),
(2, 'Allen', 'France','21','43678055'),
(3, 'Nancy', 'India','24','34873847'),
(4, 'Adi', 'Ireland','21','320254934'),
(5, 'Sandy', 'Spain','22','70248679');
Introduction to Set Theory in SQL, Dr. Asoke Nath 11
Introduction to Set Operations in SQL
The UNION operator:
SELECT * FROM Emp2;
Introduction to Set Theory in SQL, Dr. Asoke Nath 12
Introduction to Set Operations in SQL
The UNION operator:
Example 1: SQL UNION Operator
In this example, we will find the cities (only unique values) from both the "Table1" and
the "Table2" tables:
Query:
SELECT Country FROM Emp1 UNION SELECT Country FROM Emp2
ORDER BY Country;
Introduction to Set Theory in SQL, Dr. Asoke Nath 13
Introduction to Set Operations in SQL
The UNION operator:
Example 2: SQL UNION ALL
In the below example, we will find the cities (duplicate values
also) from both the "Emp1" and the "Emp2" tables:
Query:
SELECT Country FROM Emp1 UNION ALL
SELECT Country FROM Emp2 ORDER BY Country;
Country
Australia
Austria
England
France
India
India
Ireland
Spain
Spain
Sri lanka Introduction to Set Theory in SQL, Dr. Asoke Nath 14
Introduction to Set Operations in SQL
The UNION operator:
SQL UNION ALL With WHERE
We can use the WHERE clause with UNION ALL in SQL. The WHERE clause is used to
filter records and is added after each SELECT statement
Example : SQL UNION ALL with WHERE
The following SQL statement returns the cities (duplicate values also) from both the "Geeks1"
and the "Geeks2" tables:
Query:
SELECT Country, Name FROM Emp1 WHERE Name='Aditya‘ UNION ALL
SELECT Country, Name FROM Emp2 WHERE Country='Ireland‘ ORDER BY Country;
Introduction to Set Theory in SQL, Dr. Asoke Nath 15
Introduction to Set Operations in SQL
The UNION operator:
Output
Important Points About SQL UNION Operator
The SQL UNION operator combines the result sets of two or more SELECT queries.
UNION returns unique rows, eliminating duplicate entries from the result set.
UNION ALL includes all rows, including duplicate rows.
Columns in the result set must be in the same order and have the same data types.
UNION is useful for aggregating data from multiple tables or applying different filters
to data from the same table.
Introduction to Set Theory in SQL, Dr. Asoke Nath 16
Introduction to Set Operations in SQL
The UNION operator:
Conclusion
The SQL UNION operator is a powerful tool for combining
multiple SELECT statements into one result set. Whether you
need to eliminate duplicates or include them, UNION and
UNION ALL provide flexible options for aggregating data from
multiple tables. Understanding how and when to use these
operators will make your SQL queries more efficient and
effective for data retrieval and analysis.
Introduction to Set Theory in SQL, Dr. Asoke Nath 17
Introduction to Set Operations in SQL
The UNION operator
The UNION operator combines the results of two or more
SELECT queries into a single result set, removing duplicate rows
by default.
For example, suppose we have two
tables, employees and contractors, each with similar columns such
as contractors, department, and salary.
The Syntax of the SQL UNION operator is:
SELECT columnnames FROM table1
UNION
Introduction to Set Theory in SQL, Dr. Asoke Nath 18
SELECT columnnames FROM table2;
Introduction to Set Operations in SQL
The UNION operator Table-2: Contractors
Table-1: Employees
Name department salary
name Department Salary
David Marketing 60000
Alice Marketing 65000
Eva Sales 68000
Bob Sales 70000
Carol Engineering 80000 Carol Engineering 75000
John HR 55000
We can combine the results from both tables using the following command:
-- Using INTERSECT to find common employees
SELECT name, department FROM employees INTERSECT SELECT name,
department FROM contractors;
Introduction to Set Theory in SQL, Dr. Asoke Nath 19
Introduction to Set Operations in SQL
The UNION operator
This query selects the name, department, and salary columns from both
the employees and contractors tables and combines them into a single result set. The
UNION operator automatically removes duplicate rows from the final result set.
name department salary
Alice Marketing 65000
Bob Sales 70000
Carol Engineering 80000
John HR 55000
David Marketing 60000
Eva Sales 68000
Introduction to Set Theory in SQL, Dr. Asoke Nath 20
Introduction to Set Operations in SQL
The UNION operator
Notice that Carol, who appears in both tables, is only listed once in the result. If
we wanted to keep both instances of Carol (with her different salaries), we would
use UNION ALL.
It's important to remember that the UNION operator does not remove NULL
values. If a column contains NULL values in one result set and non-NULL values
in the corresponding column of another result set, the NULL values will be
retained in the final result set produced by the UNION operator.
If we want to include NULL values in the result set and prevent their removal by
the UNION operator, we can use the UNION ALL operator instead. This operator
combines the results of multiple SELECT queries, including all rows from each
result set, regardless of whether they are duplicates or contain NULL values.
Introduction to Set Theory in SQL, Dr. Asoke Nath 21
Introduction to Set Operations in SQL
The INTERSECT operator
The INTERSECT operator returns only the rows that appear in both result sets. Think
of it as finding the people who belong to both groups.
Let’s use INTERSECT to query our tables above. For the sake of the example, let’s just
query the name and department columns:
-- Using INTERSECT to find common employees
SELECT name, department FROM employees INTERSECT
SELECT name, department FROM contractors;
Introduction to Set Theory in SQL, Dr. Asoke Nath 22
Introduction to Set Operations in SQL
The INTERSECT operator
This query selects the name and department columns from both the employees and
contractors tables and returns only the rows that exist in both tables based on all selected
columns.
name department
Carol Engineering
values equal when comparing corresponding columns. It also results in an empty set when dealing with
empty result sets.
In other words, if a NULL value is present in one result set and the corresponding column in the other
result set contains a non-NULL value, the rows are not considered equal – they will not be included in
the intersection result.
Additionally, If one of the result sets provided to the INTERSECT operator is empty (i.e., it contains no
rows), the overall result of the INTERSECT operation will also be empty since there are no common
rows between an empty set and any other set.
Introduction to Set Theory in SQL, Dr. Asoke Nath 23
Introduction to Set Operations in SQL
The EXCEPT (MINUS) operator :
The EXCEPT operator retrieves the rows present in the first result set but not in the
second.
For example, let’s say we execute the following query:
-- Using EXCEPT to find employees who are not contractors
SELECT name, department, salary FROM employees
EXCEPT SELECT name, department, salary FROM contractors;
Introduction to Set Theory in SQL, Dr. Asoke Nath 24
Introduction to Set Operations in SQL
The EXCEPT (MINUS) operator :
The name, department, and salary columns are selected from the employees table and
return only the rows that do not exist in the contractors table.
name department salary
Alice Marketing 65000
Bob Sales 70000
John HR 55000
The EXCEPT operator also follows standard comparison rules for handling NULL
values. Its behavior with empty result sets results in an empty set if the first result set is
empty or includes all rows from the first result set if the second result set is empty.
Introduction to Set Theory in SQL, Dr. Asoke Nath 25
Introduction to Set Operations in SQL
Set Operators: Performance and Optimization
The impact of set operators on query performance in SQL can vary depending on factors
such as the size of the datasets involved, the complexity of the queries, and the database
management system (DBMS) used.
Let's break down the key factors and strategies for optimization.
Data volume and query complexity
When working with large amounts of data, set operators can significantly impact query
performance because the size of the result sets that need to be combined, intersected, or
compared increases the processing time required to perform that operation.
Complex queries containing multiple subqueries, joins, or set operators may result in
additional processing overhead and impact query performance. Chained operations or
nested set operations could further exacerbate performance consequences.
Introduction to Set Theory in SQL, Dr. Asoke Nath 26
Introduction to Set Operations in SQL
Indexing and optimization techniques
Proper indexing of the columns involved in set operations can significantly improve
query performance. Indexes help the database engine quickly locate and retrieve the
relevant rows, reducing the need for full-table scans and improving query execution
times.
To enhance the performance of queries involving set operators, database
administrators, and developers can utilize optimization techniques like query
rewriting, query plan analysis, and database schema optimization. Techniques like
query caching and materialized views can also be used to precompute and store the
results of complex queries, reducing the computational overhead of set operations.
Introduction to Set Theory in SQL, Dr. Asoke Nath 27
Introduction to Set Operations in SQL
Database engine and hardware resources
The performance of set operations may vary depending on the underlying database
engine and its optimization capabilities. Different DBMSs may employ different
optimization strategies and algorithms for processing set operations, leading to
variations in performance.
The availability of hardware resources such as CPU, memory, and disk I/O also
influences the performance of queries involving set operators. Adequate hardware
resources can help mitigate performance bottlenecks and ensure efficient query
execution.
Introduction to Set Theory in SQL, Dr. Asoke Nath 28
Introduction to Set Operations in SQL
Set Operators in Practice: A Case Study
Set operators aren't just theoretical tools; they have real-world applications that can
significantly impact business decisions. Let's walk through a simplified example of how
a company might use set operators to segment its customer base for targeted marketing
campaigns.
The scenario
Imagine a company that sells both online and in physical stores. They have two separate
datasets:
Online purchases: Customer ID, purchase history, demographics, and location for online
shoppers.
In-store transactions: Similar information for customers who shopped in person.
Introduction to Set Theory in SQL, Dr. Asoke Nath 29
Introduction to Set Operations in SQL
Using set operators
To get a complete picture of all customers, the company would first use UNION to
combine both datasets into a single table, removing any duplicates. This gives them a
unified view of their entire customer base.
Next, they could use INTERSECT to identify customers who have shopped both online
and in-store. This segment is particularly valuable as they are highly engaged with the
brand across multiple channels.
To find opportunities for cross-channel promotion, the company could use EXCEPT. For
example, SELECT * FROM online_purchases EXCEPT SELECT * FROM
in_store_transactions would find customers who have only shopped online but
not in stores. The company could then target these customers with promotions
encouraging them to visit a physical location.
Introduction to Set Theory in SQL, Dr. Asoke Nath 30
Introduction to Set Operations in SQL
Limitations and Considerations
When using set operators in SQL, it's essential to consider several limitations and factors that can
affect query performance, result accuracy, and overall usability.
Data type compatibility and NULL values
Corresponding columns in the result sets must have compatible data types. Ensure data consistency
and compatibility across result sets to avoid errors and unexpected results.
Set operators may treat NULL values differently depending on the DBMS and the specific operator.
To avoid errors, developers must understand how NULL values are treated.
Introduction to Set Theory in SQL, Dr. Asoke Nath 31
Introduction to Set Operations in SQL
Performance impact and duplicate rows
Set operations can significantly impact query performance, especially when dealing
with large or complex datasets. Factors like indexing, query optimization, and
hardware resources can influence performance. Optimization techniques and
performance-tuning strategies are essential to mitigate performance bottlenecks.
By default, set operators remove duplicate rows from the result set. However, in
some cases, retaining duplicate rows may be necessary. It's important to understand
set operators' behavior regarding duplicate rows and use appropriate techniques to
handle duplicates if needed.
Introduction to Set Theory in SQL, Dr. Asoke Nath 32
Introduction to Set Operations in SQL
Ordering of results and memory constraints
Set operators do not guarantee the order of the results in the final output. If preserving the order of
results is essential, additional sorting operations may be required after applying set operators.
Set operations can consume significant memory and resources, particularly when dealing with large
datasets. Memory constraints and resource limitations must be considered to avoid performance
degradation or system instability.
Complexity, maintainability, and cross-DBMS compatibility
Complex queries involving multiple set operators, subqueries, and joins can be challenging to
understand, maintain, and debug. To improve readability and maintainability, queries must be
concise, well-documented, and modular.
Set operators may have variations in syntax and behavior across different database management
systems (DBMSs). Knowing these differences is vital when writing SQL queries for cross-platform
compatibility
Introduction to Set Theory in SQL, Dr. Asoke Nath 33
Introduction to Set Operations in SQL
Conclusion
In summary, set operators are core components of SQL and
indispensable for efficient data manipulation.
They encompass functions like UNION, INTERSECT, and
EXCEPT and empower us with versatile data analysis
capabilities.
Introduction to Set Theory in SQL, Dr. Asoke Nath 34