What is a Subquery?

A subquery is a query nested inside another SQL statement. It’s basically a query within a query. You’re using the results of one SELECT statement to help another SQL statement do its job.

Subqueries let you break down complex problems into smaller, more manageable pieces, making your SQL more readable and often more powerful. The outer query relies on the inner query (the subquery) to provide data, filter results, or perform calculations. Once the subquery executes and returns its results, the outer query uses that information to complete its task.

Where Can You Use Subqueries?

Subqueries are versatile and can appear in several parts of a SQL statement:

In the WHERE clause to filter rows based on values from another query
In the FROM clause as a temporary table (sometimes called an inline view or derived table)
In the SELECT clause to calculate values for each row
In the HAVING clause to filter grouped results
With INSERT, UPDATE, or DELETE statements to determine which rows to modify

Most modern database systems support subqueries, including PostgreSQL, MySQL, SQL Server, Oracle, and SQLite. While the core syntax remains consistent across platforms, some advanced features might vary.

Basic Subquery Syntax

The simplest form of a subquery looks like this:

SELECT column1, column2
FROM table1
WHERE column1 = (SELECT column1 FROM table2 WHERE condition);

The subquery is enclosed in parentheses and executes first. Its result is then used by the outer query.

Types of Subqueries

Subqueries come in different flavors depending on what they return and how they relate to the outer query:

Scalar subqueries return a single value (one row, one column). These are commonly used with comparison operators like =, >, or <.
Row subqueries return a single row with multiple columns, useful when comparing against multiple values at once.
Table subqueries return multiple rows and columns, acting like a temporary table for the outer query.
Correlated subqueries reference columns from the outer query, meaning they execute once for each row processed by the outer query. Non-correlated subqueries, on the other hand, execute independently and only once.

Examples

Let’s run a few examples. First, we’ll create a couple of tables and populate them with data:

CREATE TABLE planets (
    planet_id INT PRIMARY KEY,
    name VARCHAR(50),
    discovery_year INT,
    mass_earth_ratio DECIMAL(10, 2)
);

CREATE TABLE missions (
    mission_id INT PRIMARY KEY,
    mission_name VARCHAR(50),
    target_planet_id INT,
    launch_year INT,
    budget_millions DECIMAL(10, 2)
);

INSERT INTO planets VALUES
(1, 'Kepler-442b', 2015, 1.34),
(2, 'Proxima Centauri b', 2016, 1.17),
(3, 'TRAPPIST-1e', 2017, 0.62),
(4, 'TOI 700 d', 2020, 1.19),
(5, 'K2-18b', 2015, 8.63);

INSERT INTO missions VALUES
(1, 'Exodus Alpha', 2, 2035, 450.00),
(2, 'New Horizons II', 1, 2032, 380.00),
(3, 'Trappist Explorer', 3, 2038, 520.00),
(4, 'Genesis Mission', 2, 2040, 890.00);

So we’re using a planetary theme here.

Now let’s find all missions targeting planets discovered after 2016:

SELECT mission_name, launch_year
FROM missions
WHERE target_planet_id IN (
    SELECT planet_id 
    FROM planets 
    WHERE discovery_year > 2016
);

Result:

mission_name       launch_year
-----------------  -----------
Trappist Explorer  2038

The subquery is needed here because we’re filtering missions based on criteria from a different table (planets), where the discovery year information lives. You’d need the subquery or a JOIN to accomplish this. You can’t just add a WHERE clause to missions alone since it doesn’t have the discovery_year column.

This query returns missions targeting TRAPPIST-1e and TOI 700 d, since both planets were discovered after 2016. The subquery first identifies which planets meet the criteria, then the outer query filters missions accordingly. However, in this case only one mission targeted either of those planets, and so just that mission was returned. TOI 700 d has no missions targeting it, and so no missions were returned for that one.

If we change the discovery_year to 2015, we get more results:

SELECT mission_name, launch_year
FROM missions
WHERE target_planet_id IN (
    SELECT planet_id 
    FROM planets 
    WHERE discovery_year > 2015
);

Result:

mission_name       launch_year
-----------------  -----------
Exodus Alpha       2035       
Trappist Explorer  2038       
Genesis Mission    2040

Here’s an example using a subquery in the SELECT clause to add a calculated column (kind of like a generated column):

SELECT 
    name,
    mass_earth_ratio,
    (SELECT AVG(mass_earth_ratio) FROM planets) AS avg_mass,
    mass_earth_ratio - (SELECT AVG(mass_earth_ratio) FROM planets) AS difference_from_avg
FROM planets;

Result:

name                mass_earth_ratio  avg_mass  difference_from_avg
------------------  ----------------  --------  -------------------
Kepler-442b         1.34              2.59      -1.25              
Proxima Centauri b  1.17              2.59      -1.42              
TRAPPIST-1e         0.62              2.59      -1.97              
TOI 700 d           1.19              2.59      -1.4               
K2-18b              8.63              2.59      6.04

This query shows each planet’s mass compared to the average mass of all planets. The subquery calculates the average and makes it available for each row in the result.

Subqueries in the `FROM` Clause

When you use a subquery in the FROM clause, you’re creating a derived table that exists only for the duration of that query. This can be helpful when you need to aggregate data first and then perform additional operations on those aggregates.

SELECT mission_decade, AVG(budget_millions) AS avg_budget
FROM (
    SELECT 
        mission_name,
        (launch_year / 10) * 10 AS mission_decade,
        budget_millions
    FROM missions
) AS decade_data
GROUP BY mission_decade;

Result:

mission_decade  avg_budget
--------------  ----------
2030            450       
2040            890

The inner query organizes missions by decade, and the outer query calculates average budgets per decade. Most database systems require you to provide an alias for derived tables (like decade_data in this example).

Correlated Subqueries

A correlated subquery is one that cannot run independently because it relies on values from the outer query. It is re-executed for each row of the outer query, making it a flexible but often computationally expensive operation.

Here’s an example:

SELECT p.name, p.mass_earth_ratio
FROM planets p
WHERE p.mass_earth_ratio > (
    SELECT AVG(mass_earth_ratio)
    FROM planets
    WHERE discovery_year = p.discovery_year
);

Result:

name    mass_earth_ratio
------  ----------------
K2-18b  8.63

This finds planets that are more massive than the average mass of planets discovered in the same year. Notice how the subquery references p.discovery_year from the outer query. That’s what makes it correlated.

Subqueries with `EXISTS`

The EXISTS operator checks whether a subquery returns any rows. It does this with a boolean check. It’s highly efficient because, unlike other methods, it doesn’t retrieve or count all matching rows. Rather, it simply stops executing as soon as the first match is confirmed.

Example:

SELECT name, discovery_year
FROM planets p
WHERE EXISTS (
    SELECT 1
    FROM missions m
    WHERE m.target_planet_id = p.planet_id
    AND m.budget_millions > 500
);

Result:

name                discovery_year
------------------  --------------
Proxima Centauri b  2016          
TRAPPIST-1e         2017

This returns planets that have at least one mission with a budget exceeding 500 million. The subquery doesn’t need to return actual data because we only care whether matching rows exist. You’ll often see SELECT 1 in EXISTS subqueries because the actual selected values don’t matter.

Performance Considerations

Subqueries are convenient, but they’re not always the fastest option. In many cases, you can rewrite subqueries as JOINs, which database optimizers often handle more efficiently. Modern database systems have gotten better at optimizing subqueries automatically, but it’s worth testing performance on large datasets.

Correlated subqueries in particular can be slow because they execute repeatedly. If you find a correlated subquery causing performance issues, consider whether you can rewrite it as a JOIN or use window functions instead (if your DBMS supports them).

That said, readability matters too. Sometimes a subquery expresses your intent more clearly than a complex JOIN, even if it’s slightly slower. For smaller datasets or one-off queries, the difference is often negligible.

Common Pitfalls

One mistake people make is using subqueries that return multiple values when a single value is expected. For example, the following query will cause an error:

SELECT name
FROM planets
WHERE discovery_year = (SELECT discovery_year FROM planets);

Output:

Msg 512, Level 16, State 1, Line 1
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

That’s the error SQL Server returned when I ran the query. In this case the subquery returns multiple years, but the = operator expects a single value. You’d need to use the IN operator instead, or add a WHERE clause to ensure the subquery returns only one row.

Another issue is NULL handling. If a subquery returns NULL, comparisons might not work as you expect. The expression column = (SELECT ...) will be NULL if the subquery returns NULL, which won’t match any rows in a WHERE clause.

Alternatives to Subqueries

While subqueries are useful, other SQL features can sometimes do the same job more elegantly. For example:

Common Table Expressions (CTEs) using the WITH clause make complex queries more readable by naming subqueries upfront.
JOINs often perform better for combining data from multiple tables.
Window functions can replace certain correlated subqueries, especially for calculations involving rankings or running totals.

Each approach has its place, and the best choice depends on your specific situation, such as the complexity of your query, performance requirements, and what makes the code easiest for your team to understand and maintain.

Summary

Subqueries are a fundamental SQL tool that let you compose complex operations from simpler building blocks. They work by nesting one query inside another, allowing the outer query to leverage results from the inner query. You can place them in various parts of a SQL statement, and they come in different forms depending on whether they return single values, rows, or tables, and whether they reference the outer query.

The key to using subqueries effectively is understanding when they make your SQL clearer and when another approach might work better. Start with simple subqueries to solve straightforward problems, and as you get comfortable, you’ll develop intuition for when to use them versus JOINs, CTEs, or other techniques.

Where Can You Use Subqueries?

Basic Subquery Syntax

Types of Subqueries

Examples

Subqueries in the FROM Clause

Correlated Subqueries

Subqueries with EXISTS

Performance Considerations

Common Pitfalls

Alternatives to Subqueries

Summary

Subqueries in the `FROM` Clause

Subqueries with `EXISTS`