Master
Complex Joins
Set Operations
Window
Functions
Training Outcome
Learners will gain the following skills and knowledge:
Understand and apply INNER, OUTER, LEFT, RIGHT and FULL JOINS for
combining and analyzing data from multiple tables. In other to utilize
JOINS operations to solve real-world business problems, such as
generating reports and tracking resources.
Write and nest Subqueries to implement dynamic filtering and
calculations to solve advanced data analysis problems, such as
identifying outliers, comparing metrics or aggregating data
hierarchically.
Continuation
Learners will gain the following skills and knowledge:
Apply UNION, INTERSECT, and EXCEPT to combine, compare, and exclude
datasets. In other to perform tasks like merging data from different
sources, analyzing overlaps, and identifying unique data points.
Apply ROW_NUMBER(), RANK(), LEAD(), and LAG() for advanced row-level
analysis to analyze trends such as rank employees or products and
identify gaps or changes in datasets over time.
Master Complex
Joins(INNER,
LEFT, RIGHT,
FULL OUTER) and
Subqueries.
Mastering Complex Joins
Joins are used to combine rows based on related columns.
Advanced joins (INNER, LEFT, RIGHT, FULL OUTER) allow for more
sophisticated data retrieval, enabling in-depth analysis across datasets.
Subqueries: SQL subquery is a query inside another query. We retrieve
data from two tables using a subquery. The statement that contains a
subquery is sometimes known as an outer query or outer select, whereas
a subquery is frequently also termed an inner query.
INNER JOIN
Combines rows that have matching
in both tables.
Syntax:
SELECT columns
FROM table1
INNER JOIN table2
ON table1.column_name =
table2.column_name;
Example:
Find employees with their
department names.
Source
LEFT JOIN
Returns all rows from the left
table and matched rows from
the right table. If no match is
found, NULL is returned.
Syntax:
SELECT columns
FROM table1
LEFT JOIN table2
ON table1.column_name =
table2.column_name;
Example:
List all employees and their
departments, even if some are
not assigned
Source
RIGHT JOIN
Returns all rows from the right
table and matched rows from
the left table.
Syntax:
SELECT columns
FROM table1
RIGHT JOIN table2
ON table1.column_name =
table2.column_name;
Example:
List all departments and their
employees, even if some
departments have no
employees.
Source
FULL OUTER JOIN
Combines rows from both tables, displaying
matching rows from both, as well as
unmatched rows from each table. Unmatched
rows will have NULL values for the missing
columns.
Syntax:
SELECT columns
FROM table1
FULL OUTER JOIN table2
ON table1.column_name =
table2.column_name
Example:
Combine employees and
departments data, showing all
records from both tables.
Source
Subqueries
Is a query in another SQL
statement(nested query).
It allows you to use the result of
one query as input in another
query.
They are used for filtering or
advanced logic.
Example:
Find employees earning more
than the average salary in their
department.
Source
Set Operations
UNION
The UNION operator combines data from
two or more SELECT queries into a single
unique result. This operator removes any
duplicates present in the combined result.
Syntax:
SELECT columns
FROM table1
UNION
SELECT columns
FROM table2
Example:
List all unique department ids in
employee's table
Source
INTERSECT
The INTERSECT operator get records
that are identical or shared between
the result sets of two or more tables.
Syntax:
SELECT columns
FROM table1
INTERSECT
SELECT columns
FROM table2
Example:
List department names not yet associated
with employees
Source
EXCEPT
All unique records from the
left operand (query) can be
retrieved using SQL EXCEPT
operator, with the exception
of those that are part of the
right operand (query's) result
set.
Syntax:
SELECT columns
FROM table1,
EXCEPT
SELECT columns
FROM table2
Source
Window
Functions
ROW_NUMBER Function
The ROW_NUMBER function is a ranking function that gives each new
record in a partition a sequential rank number.
The SQL Server ROW_NUMBER function will assign different rank numbers
to two identical values in the same partition, depending on the order in
which they are displayed.
ROW_NUMBER Function
Syntax:
SELECT ROW_NUMBER() OVER
(PARTITION
BY partition_column ORDER
BY order_column) AS
row_number, column_name
FROM table_name;
OVER
This clause specifies the window
or set of rows that the window
function operates.
Source
ROW_NUMBER Function
PARTITION BY
It is an optional clause in the ROW_NUMBER function. It is a clause that
divides the result set into partitions (groups of rows).
ORDER BY
Within each partition, this clause allows us to order the rows in the result
set. Because the ROW_NUMBER() function is order-dependent, it is a
necessary clause.
Example:
Number employees by department.
RANK Function
Is used to assign a rank to each row within a
partition of a result set. Unlike other ranking
functions, the RANK() function introduces
gaps in the ranking sequence for rows with
duplicate values, making it particularly useful
for scenarios requiring rank differentiation.
Syntax:
SELECT column1, column2,
RANK() OVER (PARTITION BY partition_column
ORDER BY sort_column)
FROM table_name;
Example:
Rank employees by salary within each
department.
Source
LEAD and LAG Function
They allow users to access data
from preceding or following rows
without the need for self-joins or
subqueries, making them invaluable
for analyzing sequential or
time-series data.
The LAG() function is used to get
value from the row that precedes
the current row.
The LEAD() function is used to get
value from a row that succeeds the
current row.
Example:
How does an employee's salary or performance
score compare to their previous and next
evaluations within their department?
Source
Conclusion
At the end learners:
Mastered complex joins to analyze data
from multiple related tables across various
scenarios, leveraging subqueries for
advanced filtering.
Gained expertise in applying set
operations (UNION, INTERSECT, EXCEPT) for
effective data comparison across multiple
datasets.
Developed skills in using ROW_NUMBER
and RANK functions for row identification
and ranking within datasets.
Learned practical applications of LEAD and
LAG functions for time-series and
sequential data analysis.
Source
Assignment
Kindly find attached tables (find tables here) to
answer the following questions
1. Write a query to list all customers and
their transaction details. Ensure
customers without transactions and
transactions without matching
customers are included.
2. Identify the customer(s) who have the
highest total transaction amount using
a subquery.
3. Write a query to combine the list of
customers from Lagos and Edo using
UNION, excluding duplicates.
4. Assign a rank to each transaction based
on the transaction amount in
descending order. Additionally, use LEAD
and LAG to display the next and previous
transaction amounts for each
transaction, ordered by date.
Source