What is a UNION in SQL?

If you ever find yourself needing to combine the results from multiple SELECT statements into a single result set, UNION is probably going to be the tool for the job. By “UNION“, I mean the UNION operator.

The UNION operator takes the output from two SELECT queries and stacks them on top of each other. It basically merges two lists into one, removing any duplicates along the way.

The Basic Syntax

The syntax goes something like this:

SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2;

While it’s all quite straightforward, there are a few things you need to remember, depending on your RDBMS. Most RDBMSs will require that each SELECT statement has the same number of columns, and for those columns to have compatible data types. Also, the column names from the first SELECT statement are typically used in the final result (but you can use aliases if required).

Example

Let’s say you’re tracking project tasks across two different systems and need to compile a complete list. Here’s how you might set that up in SQL Server:

-- Create sample tables
CREATE TABLE InternalTasks (
    TaskID INT,
    TaskName VARCHAR(100),
    Priority VARCHAR(20)
);

CREATE TABLE ClientTasks (
    TaskID INT,
    TaskName VARCHAR(100),
    Priority VARCHAR(20)
);

-- Insert sample data
INSERT INTO InternalTasks VALUES 
(1, 'Update API documentation', 'Medium'),
(2, 'Fix authentication bug', 'High'),
(3, 'Refactor database queries', 'Low');

INSERT INTO ClientTasks VALUES 
(101, 'Add export feature', 'High'),
(102, 'Fix authentication bug', 'High'),
(103, 'Improve dashboard loading', 'Medium');

Now let’s run a query with UNION to combine the rows from both tables:

SELECT TaskName, Priority FROM InternalTasks
UNION
SELECT TaskName, Priority FROM ClientTasks
ORDER BY Priority DESC;

Result:

TaskName                   Priority
-------------------------  --------
Improve dashboard loading  Medium  
Update API documentation   Medium  
Refactor database queries  Low     
Add export feature         High    
Fix authentication bug     High

Notice that “Fix authentication bug” appears in both tables, but it only shows up once in the result. That’s because UNION automatically removes duplicates. The result includes five tasks instead of six.

But if we add the TaskID to the SELECT lists, then it’s no longer a duplicate, and we get six rows:

SELECT TaskID, TaskName, Priority FROM InternalTasks
UNION
SELECT TaskID, TaskName, Priority FROM ClientTasks
ORDER BY Priority DESC;

Result:

TaskID  TaskName                   Priority
------  -------------------------  --------
1       Update API documentation   Medium  
103     Improve dashboard loading  Medium  
3       Refactor database queries  Low     
2       Fix authentication bug     High    
101     Add export feature         High    
102     Fix authentication bug     High

This time “Fix authentication bug” appears twice because the TaskID is different in each table (it’s 2 in the InternalTasks table and 102 in ClientTasks). Therefore, the rows are no longer duplicates.

`UNION` vs `UNION ALL`

But if you actually want to keep the duplicates, you can use UNION ALL instead:

SELECT TaskName, Priority FROM InternalTasks
UNION ALL
SELECT TaskName, Priority FROM ClientTasks
ORDER BY Priority DESC;

Result:

TaskName                   Priority
-------------------------  --------
Update API documentation   Medium  
Improve dashboard loading  Medium  
Refactor database queries  Low     
Add export feature         High    
Fix authentication bug     High    
Fix authentication bug     High

I removed the TaskID columns and now all six tasks are returned, including both instances of “Fix authentication bug”.

UNION ALL is also faster because it doesn’t need to do the extra work of checking for and removing duplicates. If you know your data doesn’t have duplicates or you need to keep them, UNION ALL is the way to go.

Column Compatibility Matters

Your SELECT statements need to play nicely together. The number of columns must match, and the data types need to be compatible. You can’t UNION a VARCHAR column with an INT column, for example. Most database systems will either convert compatible types automatically or throw an error if they can’t.

The following fails when I try to run it in SQL Server:

SELECT TaskName, Priority FROM InternalTasks
UNION
SELECT TaskID, Priority FROM ClientTasks
ORDER BY Priority DESC;

Output:

Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the varchar value 'Update API documentation' to data type int.

It failed because I tried to combine TaskName from one table with TaskID from the other. TaskName is a VARCHAR and TaskID is an INT. SQL Server couldn’t convert the VARCHAR to an INT and so it returned an error.

The Number of Columns Must Match

As mentioned, both SELECT statements need to select the same number of columns, otherwise we’ll get an error:

SELECT TaskID, TaskName FROM InternalTasks
UNION
SELECT TaskID FROM ClientTasks;

Output:


Msg 205, Level 16, State 1, Line 1
All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.

In this case I tried to combine two columns with one. Not a good idea.

Ordering the Results

You can only use ORDER BY once, and it goes at the very end after all your UNION operations:

SELECT TaskID, TaskName, Priority FROM InternalTasks
UNION
SELECT TaskID, TaskName, Priority FROM ClientTasks
ORDER BY Priority DESC, TaskName;

Result:

TaskID  TaskName                   Priority
------  -------------------------  --------
103     Improve dashboard loading  Medium  
1       Update API documentation   Medium  
3       Refactor database queries  Low     
101     Add export feature         High    
2       Fix authentication bug     High    
102     Fix authentication bug     High

This query returns all unique tasks sorted first by priority, and then alphabetically by task name within each priority level. You’ll see TaskIDs from both tables mixed together based on the sorting criteria, not based on which table they came from.

If you try to put ORDER BY in the individual SELECT statements, most database systems will complain. The ordering applies to the final combined result, not the intermediate queries.

Performance Considerations

UNION can be resource-intensive, especially on large datasets. The duplicate-checking process requires sorting and comparing rows, which takes time and memory. If you’re working with millions of rows and you know there are no duplicates (or you don’t care about them), UNION ALL will give you significantly better performance.

You can also improve performance by being selective about what you’re querying. Instead of SELECT *, specify only the columns you actually need. And if possible, add WHERE clauses to filter data before the UNION operation happens.

Summary

The UNION operator is a clean, efficient way to combine result sets in SQL. Use UNION when you want to eliminate duplicates, and UNION ALL when you don’t. Make sure your columns line up properly in terms of count and data types, and keep performance in mind when working with large datasets.

The Basic Syntax

Example

UNION vs UNION ALL

Column Compatibility Matters

The Number of Columns Must Match

Ordering the Results

Performance Considerations

Summary

`UNION` vs `UNION ALL`