0% found this document useful (0 votes)
25 views14 pages

Window Functions in SQL

The document explains the use of window functions in SQL, highlighting their ability to perform calculations across a set of rows while retaining individual row details. It covers key concepts such as the OVER clause, partitioning, ordering, and frame specifications, along with examples demonstrating their application for calculating averages and cumulative totals. Additionally, it contrasts window functions with GROUP BY, emphasizing their flexibility and use cases in detailed data analysis.

Uploaded by

chetanthakurbiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views14 pages

Window Functions in SQL

The document explains the use of window functions in SQL, highlighting their ability to perform calculations across a set of rows while retaining individual row details. It covers key concepts such as the OVER clause, partitioning, ordering, and frame specifications, along with examples demonstrating their application for calculating averages and cumulative totals. Additionally, it contrasts window functions with GROUP BY, emphasizing their flexibility and use cases in detailed data analysis.

Uploaded by

chetanthakurbiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ADVANCED SQL

https://www.linkedin.com/in/niranjana405/

WINDOW FUNCTIONS
Why Use Window Functions?

Imagine you want to calculate the average salary per department. This is straightforward with GROUP BY:

SELECT department, AVG(salary) AS avg_salary FROM employees GROUP BY department;

This gives you the average salary per department, but it collapses the results into one row per department.

The Problem

What if you need to see every employee’s salary alongside the average salary of their department?

With GROUP BY, you lose individual employee details because aggregation collapses rows.
You need a way to retain all individual rows while adding aggregated calculations (like department
averages) as a new column.

The Solution: Window Functions

A window function performs a calculation across a specified set of table rows that are logically
related to the current row. Unlike regular aggregations (e.g., SUM, AVG), it retains all individual rows
while computing results over a defined subset (a "window") of the dataset.
The OVER clause in SQL is used with window functions to define the window (or set of rows) over
which the function should operate.

Key Points About the OVER Clause

1. Defining the Window:


The OVER clause tells SQL which rows to include in the window (or group) for the calculation.
2. Partitioning:(optional)
PARTITION BY inside the OVER clause divides the result set into partitions. The window function is
then applied to each partition separately.
Example: PARTITION BY customer_id means that the function will be calculated separately for
each customer.
3. Ordering:(optional)
ORDER BY inside the OVER clause specifies the order in which rows are processed within each
partition.
Example: ORDER BY order_date will sort the rows by the order date, which is useful for functions
like running totals or ranking.
4. Frame Specification (Optional):
You can further narrow down the window with a frame clause, such as ROWS BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW. This limits the window to a range of rows relative
to the current row.

Example: Employee Salary vs. Department Average

SELECT employee_id, name, department, salary, AVG(salary) OVER (PARTITION BY department) AS


dept_avg_salary FROM employees;

Output:

employee_id name department salary dept_avg_salary

1 Alice Sales 5000 5500

2 Bob Sales 6000 5500

3 Charlie HR 4000 4500

4 Dave HR 5000 4500

Here:

Each employee’s row is preserved.


dept_avg_salary shows the average salary for their department, calculated over a window of rows
partitioned by department.

WITHOUT USING PARTITION BY

If you want to use OVER() without specifying PARTITION BY, the query will calculate the aggregate (e.g.,
AVG, SUM) over the entire dataset rather than dividing it into partitions.

Here’s an example that omits the PARTITION BY clause:

SELECT employee_id, name, department, salary, AVG(salary) OVER () AS overall_avg_salary FROM employees;

Output:
employee_id name department salary overall_avg_salar
y

1 Alice Sales 5000 5000.00

2 Bob Sales 6000 5000.00

3 Charlie HR 4000 5000.00

4 Dave HR 5000 5000.00

Explanation:

1. OVER ():
The absence of PARTITION BY means the calculation considers all rows in the dataset.
Here, the average salary is computed across all employees (irrespective of their departments).
2. Result Difference:
Without PARTITION BY, the average salary is the same for all rows, reflecting the overall dataset's
average.

Key Benefits of Window Functions

1. Retain Detail: Keep all individual rows while adding aggregated calculations.
2. Flexibility: Define custom "windows" using:
PARTITION BY: Group rows (like GROUP BY, but without collapsing).
ORDER BY: Sort rows within the window (e.g., for running totals).
ROWS/RANGE: Define the window’s scope (e.g., "previous 3 rows").
3. Use Cases:
Compare individual values to group averages (as above).
Calculate rankings (RANK(), ROW_NUMBER()).
Compute running totals or moving averages.

Window Functions vs. GROUP BY

Feature GROUP BY Window Functions

Output Collapsed (one row per group). All rows retained.

Aggregation Scope Entire group. Customizable window of rows.

Use Case Summaries (totals, averages). Detailed analysis with context.

How window functions behave with and without a WHERE clause?

Example Table: employees


employee_id name department salary

1 Alice Sales 5000

2 Bob Sales 6000

3 Charlie HR 4000

4 Dave HR 5000

5 Eve IT 7000

6 Frank IT 8000

Example 1: Window Function WITHOUT a WHERE Clause

SELECT employee_id, name, department, salary, AVG(salary) OVER (PARTITION BY department) AS


dept_avg_salary FROM employees;

Output:

employee_id name department salary dept_avg_salary

1 Alice Sales 5000 5500

2 Bob Sales 6000 5500

3 Charlie HR 4000 4500

4 Dave HR 5000 4500

5 Eve IT 7000 7500

6 Frank IT 8000 7500

2. With a WHERE Clause

SELECT employee_id, name, department, salary, AVG(salary) OVER (PARTITION BY department) AS


dept_avg_salary FROM employees WHERE salary < 6000;

Output:
employee_id name department salary dept_avg_salary

1 Alice Sales 5000 5000

3 Charlie HR 4000 4000

4 Dave HR 5000 4000

Key Differences:

Without a WHERE Clause:

All rows are included, so the average salary reflects the entire department's data.
For instance, in the Sales department, the average salary includes both 5000 and 6000, resulting in
5500.

With a WHERE Clause:

The WHERE clause filters out rows before the window function is applied.
For example, in the Sales department, only 5000 is included, so the average salary becomes 5000.
IT department is entirely excluded since no rows have salary < 6000.

WINDOW FUNCTIONS
Function Purpose Explanation

SUM/MIN/MAX/AVG Get the sum, minimum, Aggregates values within a


maximum, or average of all partition. Examples: SUM for
records in the partition. totals, MIN for the smallest
value, MAX for the largest, and
AVG for averages.

FIRST_VALUE Returns the value from the first Useful for retrieving the
row in its partition. earliest value based on sorting,
such as the first transaction of
a customer.

LAST_VALUE Returns the value from the last Useful for getting the most
row in its partition. recent value in a partition, such
as the latest salary or
transaction.

NTH_VALUE Retrieves the value from the Specify the nth position in the
nth row in an ordered partition. partition. Example:
NTH_VALUE(column, 3) fetches
the value from the third row.

PERCENT_RANK Returns the relative rank of the Formula: (Rank - 1) / (Total rows
current row as a percentage. - 1). For example, the middle
row in a dataset will have a rank
near 0.5.

RANK Assigns a rank to each row If two rows are tied, the next
within its partition. Tied rows rank is skipped (e.g., ranks 1, 2,
share the same rank. 2, and then 4).

ROW_NUMBER Assigns a unique sequential Used to uniquely identify rows,


number to each row within its such as numbering employees
partition starting from 1. based on their joining date.

LAG/LEAD Accesses values from the LAG retrieves data from prior
previous or next rows in a rows, while LEAD fetches data
dataset. from subsequent rows. Useful
for trend analysis or
comparisons.

ORDER BY with Window Function (AVG/SUM)


What if we use AVG with ORDER BY?

SELECT employee_id, month, sales, AVG(sales)

OVER (PARTITION BY employee_id ORDER BY month) AS cumulative_avg

FROM employee_sales;

Output:

employee_id month sales cumulative_avg

1 Jan 100 100

1 Feb 200 150 (100+200)/2 =


150

1 Mar 300 200 (100+200+300)/


3 = 200

2 Jan 150 150

2 Feb 250 200 (150+250)/2 =


200

Summary

Use ORDER BY in window functions to control calculation order (e.g., running totals, moving
averages).
Use regular ORDER BY to sort the final output.

Without ORDER BY, by default the framing is usually all partition rows.

With ORDER BY, by default the framing is usually everything before the current row and the current row.

Using Framing in Window function


Frame Clause

When using framing in a window function, the frame clause allows you to define a subset (frame) of rows
relative to the current row. This "frame" is evaluated for each row to perform calculations like running
totals, moving averages, and other cumulative operations.
Key Meaning

ROWS OR RANGE Whether you want to use a range or rows as a


frame

PRECEDING Rows before the current one

FOLLOWING Rows after the current one

UNBOUNDED PRECEDING Returns all rows before the current one

UNBOUNDED FOLLOWING Returns all rows after the current one

CURRENT ROW Refers to your current row

Scenario: Calculating Cumulative Sales

You manage a retail store and want to calculate cumulative sales by transaction date for each salesperson.
The dataset has the following structure:

salesperson transaction_date sales_amount

Alice 2023-01-01 500

Alice 2023-01-02 300

Alice 2023-01-02 200

Alice 2023-01-03 400

Bob 2023-01-01 700

Bob 2023-01-02 600

1. Using ROWS

SELECT salesperson, transaction_date, sales_amount,SUM(sales_amount)

OVER (

PARTITION BY salesperson
ORDER BY transaction_date

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

) AS cumulative_sales FROM sales;

Explanation:

ROWS includes the exact physical rows in the defined frame.


In this case, cumulative sales are calculated row by row, considering only the rows before or at the
current one.

Output:

salesperson transaction_date sales_amount cumulative_sales

Alice 2023-01-01 500 500

Alice 2023-01-02 300 800

Alice 2023-01-02 200 1000

Alice 2023-01-03 400 1400

Bob 2023-01-01 700 700

Bob 2023-01-02 600 1300

Note: For Alice, both rows with 2023-01-02 are treated as separate rows, each contributing to the
cumulative calculation.

2. Using RANGE

SELECT salesperson, transaction_date, sales_amount,SUM(sales_amount)

OVER (

PARTITION BY salesperson

ORDER BY transaction_date

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

) AS cumulative_sales FROM sales;

Explanation:
RANGE includes all rows that share the same value in the ORDER BY column (e.g., transaction_date),
treating them as one "range."
In this case, cumulative sales are calculated by grouping rows with the same transaction_date.

Output:

salesperson transaction_date sales_amount cumulative_sales

Alice 2023-01-01 500 500

Alice 2023-01-02 300 1000

Alice 2023-01-02 200 1000

Alice 2023-01-03 400 1400

Bob 2023-01-01 700 700

Bob 2023-01-02 600 1300

Note: For Alice, the rows with 2023-01-02 are grouped together because they share the same transaction
date.

Key Differences:

ROWS: Works with exact physical rows; each row is treated independently.
RANGE: Groups rows with the same value in the ORDER BY column and treats them as one unit.

Real-Time Use Case:

1. Using ROWS: When tracking sequential orders or transactions where each row matters individually
(e.g., per minute or second).
2. Using RANGE: When tracking daily totals or monthly averages, where rows sharing the same date or
month are treated as one logical unit.

REAL TIME SCENARIO FOR USING LAST_VALUE

If we need to identify the recent salary of all employees , we can't simply use a GROUP BY and MAX() clause
because salaries might decrease over time. We need to find an alternative approach.

EmployeeSalaries Table
EmployeeID SalaryYear Salary

1 2023 7000.00

1 2024 8000.00

2 2023 7500.00

2 2024 7000.00

3 2023 6800.00

3 2024 6500.00

4 2023 9000.00

4 2024 9500.00

5 2023 7200.00

5 2024 7200.00

Employee 2: Salary decreased from 7500 to 7000 in 2024 due to economic recession.

Employee 3 Salary decreased from 6800 to 6500 in 2024.

Using LAST_VALUE()

LAST_VALUE() Respects Order:

The LAST_VALUE() function retrieves the value of a column from the last row in the ordered window
for each partition (e.g., each employee).
By using RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, you ensure that
the window covers all rows from the first to the last within a partition.

Correctly Identifies the Most Recent Salary:

When you order the data by SalaryYear, LAST_VALUE() will correctly return the last salary based on the
year for each employee, irrespective of whether it is the highest salary.

SELECT distinct EmployeeID, LAST_VALUE(salary)

OVER(

PARTITION BY EmployeeID

ORDER BY SalaryYear
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

) FROM EmployeeSalaries order by EmployeeID

Similar to the above, we can use FIRST_VALUE() to know the first order date for each customer alongside
every order they placed

ROW_NUMBER()

It assigns a unique sequential number to each row within a partition, starting at 1, regardless of any framing
options. To determine your product's position in its category by price, you can use the ROW_NUMBER()
function by partitioning by category and ordering by price

Key points about ROW_NUMBER:

No framing needed:

Unlike some other window functions, ROW_NUMBER does not require a frame clause to define the
window of rows it operates on.

ORDER BY is mandatory:

You must specify an ORDER BY clause within the OVER clause to determine the order in which rows
are assigned their unique numbers.

Unique number assignment:

Each row within a partition will receive a unique sequential number starting from 1.

SELECT employee_id, department, sales_amount, sales_date, ROW_NUMBER()

OVER (

PARTITION BY department

ORDER BY sales_amount DESC

) AS sales_rank FROM employee_sales;


employee_id department sales_amount sales_date sales_rank

101 Sales 1000 2024-01-10 1

102 Sales 800 2024-01-12 2

103 Sales 600 2024-01-15 3

201 IT 700 2024-01-11 1

202 IT 650 2024-01-13 2

301 Marketing 900 2024-01-09 1

302 Marketing 850 2024-01-11 2

ROW_NUMBER()/ RANK() AND DENSE_RANK()

SELECT *,

ROW_NUMBER() OVER (PARTITION BY category ORDER BY amount DESC) AS rn,

RANK() OVER (PARTITION BY category ORDER BY amount DESC) AS rnk,

DENSE_RANK() OVER (PARTITION BY category ORDER BY amount DESC) AS d_rnk

FROM expenses

ORDER BY category;

RESULT
date description category amount rn rnk d_rnk

2022-11-01 Subway Food 5000 1 1 1

2022-11-03 Domino's Food 3000 2 2 2


Pizza

2022-11-05 Burger King Food 3000 3 2 2

2022-11-07 Ice Cream Food 1000 4 4 3


Cart

2022-11-10 Zara Shopping 20000 1 1 1

2022-11-12 H&M Shopping 15000 2 2 2

2022-11-15 Flipkart Shopping 5000 3 3 3

2022-11-17 Myntra Shopping 5000 4 3 3

2022-11-20 Water Bill Utilities 6000 1 1 1

2022-11-22 Internet Bill Utilities 3000 2 2 2

2022-11-25 Gas Bill Utilities 3000 3 2 2

Explanation of Window Functions:

1. ROW_NUMBER()
Assigns a unique sequential number to each row within the partition.
No gaps in numbering, even if there are ties.
Example: "Domino's Pizza" and "Burger King" have the same amount (3000), but they get rn = 2 and
rn = 3 respectively.
2. RANK()
Assigns the same rank to tied values but skips the next rank(s) after the tie.
Example: Both "Domino's Pizza" and "Burger King" have rnk = 2 because of the tie, but the next rank
jumps to 4.
3. DENSE_RANK()
Similar to RANK(), but does not skip ranks after ties.
Example: "Domino's Pizza" and "Burger King" are tied at d_rnk = 2, but the next rank is 3 instead of
4.

https://www.linkedin.com/in/niranjana405/

You might also like