If you ever find yourself needing to combine the results from multiple SELECT statements into a single result set, UNION is probably going to be the tool for the job. By “UNION“, I mean the UNION operator.
The UNION operator takes the output from two SELECT queries and stacks them on top of each other. It basically merges two lists into one, removing any duplicates along the way.
The Basic Syntax
The syntax goes something like this:
SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2;
While it’s all quite straightforward, there are a few things you need to remember, depending on your RDBMS. Most RDBMSs will require that each SELECT statement has the same number of columns, and for those columns to have compatible data types. Also, the column names from the first SELECT statement are typically used in the final result (but you can use aliases if required).
Example
Let’s say you’re tracking project tasks across two different systems and need to compile a complete list. Here’s how you might set that up in SQL Server:
-- Create sample tables
CREATE TABLE InternalTasks (
TaskID INT,
TaskName VARCHAR(100),
Priority VARCHAR(20)
);
CREATE TABLE ClientTasks (
TaskID INT,
TaskName VARCHAR(100),
Priority VARCHAR(20)
);
-- Insert sample data
INSERT INTO InternalTasks VALUES
(1, 'Update API documentation', 'Medium'),
(2, 'Fix authentication bug', 'High'),
(3, 'Refactor database queries', 'Low');
INSERT INTO ClientTasks VALUES
(101, 'Add export feature', 'High'),
(102, 'Fix authentication bug', 'High'),
(103, 'Improve dashboard loading', 'Medium');
Now let’s run a query with UNION to combine the rows from both tables:
SELECT TaskName, Priority FROM InternalTasks
UNION
SELECT TaskName, Priority FROM ClientTasks
ORDER BY Priority DESC;
Result:
TaskName Priority
------------------------- --------
Improve dashboard loading Medium
Update API documentation Medium
Refactor database queries Low
Add export feature High
Fix authentication bug High
Notice that “Fix authentication bug” appears in both tables, but it only shows up once in the result. That’s because UNION automatically removes duplicates. The result includes five tasks instead of six.
But if we add the TaskID to the SELECT lists, then it’s no longer a duplicate, and we get six rows:
SELECT TaskID, TaskName, Priority FROM InternalTasks
UNION
SELECT TaskID, TaskName, Priority FROM ClientTasks
ORDER BY Priority DESC;
Result:
TaskID TaskName Priority
------ ------------------------- --------
1 Update API documentation Medium
103 Improve dashboard loading Medium
3 Refactor database queries Low
2 Fix authentication bug High
101 Add export feature High
102 Fix authentication bug High
This time “Fix authentication bug” appears twice because the TaskID is different in each table (it’s 2 in the InternalTasks table and 102 in ClientTasks). Therefore, the rows are no longer duplicates.
UNION vs UNION ALL
But if you actually want to keep the duplicates, you can use UNION ALL instead:
SELECT TaskName, Priority FROM InternalTasks
UNION ALL
SELECT TaskName, Priority FROM ClientTasks
ORDER BY Priority DESC;
Result:
TaskName Priority
------------------------- --------
Update API documentation Medium
Improve dashboard loading Medium
Refactor database queries Low
Add export feature High
Fix authentication bug High
Fix authentication bug High
I removed the TaskID columns and now all six tasks are returned, including both instances of “Fix authentication bug”.
UNION ALL is also faster because it doesn’t need to do the extra work of checking for and removing duplicates. If you know your data doesn’t have duplicates or you need to keep them, UNION ALL is the way to go.
Column Compatibility Matters
Your SELECT statements need to play nicely together. The number of columns must match, and the data types need to be compatible. You can’t UNION a VARCHAR column with an INT column, for example. Most database systems will either convert compatible types automatically or throw an error if they can’t.
The following fails when I try to run it in SQL Server:
SELECT TaskName, Priority FROM InternalTasks
UNION
SELECT TaskID, Priority FROM ClientTasks
ORDER BY Priority DESC;
Output:
Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the varchar value 'Update API documentation' to data type int.
It failed because I tried to combine TaskName from one table with TaskID from the other. TaskName is a VARCHAR and TaskID is an INT. SQL Server couldn’t convert the VARCHAR to an INT and so it returned an error.
The Number of Columns Must Match
As mentioned, both SELECT statements need to select the same number of columns, otherwise we’ll get an error:
SELECT TaskID, TaskName FROM InternalTasks
UNION
SELECT TaskID FROM ClientTasks;
Output:
Msg 205, Level 16, State 1, Line 1
All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.
In this case I tried to combine two columns with one. Not a good idea.
Ordering the Results
You can only use ORDER BY once, and it goes at the very end after all your UNION operations:
SELECT TaskID, TaskName, Priority FROM InternalTasks
UNION
SELECT TaskID, TaskName, Priority FROM ClientTasks
ORDER BY Priority DESC, TaskName;
Result:
TaskID TaskName Priority
------ ------------------------- --------
103 Improve dashboard loading Medium
1 Update API documentation Medium
3 Refactor database queries Low
101 Add export feature High
2 Fix authentication bug High
102 Fix authentication bug High
This query returns all unique tasks sorted first by priority, and then alphabetically by task name within each priority level. You’ll see TaskIDs from both tables mixed together based on the sorting criteria, not based on which table they came from.
If you try to put ORDER BY in the individual SELECT statements, most database systems will complain. The ordering applies to the final combined result, not the intermediate queries.
Performance Considerations
UNION can be resource-intensive, especially on large datasets. The duplicate-checking process requires sorting and comparing rows, which takes time and memory. If you’re working with millions of rows and you know there are no duplicates (or you don’t care about them), UNION ALL will give you significantly better performance.
You can also improve performance by being selective about what you’re querying. Instead of SELECT *, specify only the columns you actually need. And if possible, add WHERE clauses to filter data before the UNION operation happens.
Summary
The UNION operator is a clean, efficient way to combine result sets in SQL. Use UNION when you want to eliminate duplicates, and UNION ALL when you don’t. Make sure your columns line up properly in terms of count and data types, and keep performance in mind when working with large datasets.