When working with averages in SQL Server, it’s easy to assume that AVG() just takes all rows into account and calculates a simple mean. And that’s true. By default, AVG() includes every value in the column you point it to. But sometimes, you may want to average only unique values in that column, which is where DISTINCT comes into play.
Let’s explore this with a simple example.
Example Dataset
Suppose we’re analyzing the delivery times (in days) of orders for a logistics company. Our table OrderDeliveries looks like this:
CREATE TABLE OrderDeliveries (
OrderID INT,
DeliveryDays DECIMAL(4,2)
);
INSERT INTO OrderDeliveries (OrderID, DeliveryDays) VALUES
(1, 3.5),
(2, 5.0),
(3, 5.0),
(4, 7.2),
(5, 3.5),
(6, 10.0);
AVG() Without DISTINCT
Now let’s calculate the average delivery time using the standard AVG() (i.e. without DISTINCT):
SELECT AVG(DeliveryDays) AS AvgDelivery
FROM OrderDeliveries;
Output:
AvgDelivery
-----------
5.7
How this is calculated:
- Values used: 3.5, 5.0, 5.0, 7.2, 3.5, 10.0
- Total = 34.2
- Count = 6
- Average = 34.2 ÷ 6 = 5.70
So the average delivery time across all rows is 5.7 days.
AVG() With DISTINCT
Now let’s see what happens if we only average the unique delivery times:
SELECT AVG(DISTINCT DeliveryDays) AS AvgDistinctDelivery
FROM OrderDeliveries;
Result:
AvgDistinctDelivery
-------------------
6.425
How this is calculated:
- Distinct values: 3.5, 5.0, 7.2, 10.0
- Total = 25.7
- Count = 4
- Average = 25.7 ÷ 4 = 6.425
This gives a different result because the repeated values (3.5 and 5.0) no longer get extra weight in the calculation.
When DISTINCT Can Mislead
Now, just because we can use the DISTINCT clause doesn’t mean that we should always use it. This will depend on the data we’re trying to analyze. Sometimes it can hide important business context.
In this dataset, the delivery time of 3.5 days occurred twice, and 5.0 days occurred twice. That’s meaningful because most deliveries are happening in that 3 – 5 day range.
If we only look at the AVG(DISTINCT) result (6.425 days), it makes delivery times look slower than they really are. That’s because it treats 3.5, 5.0, 7.2, and 10.0 as equally important, even though in reality, customers experience 3.5 – 5.0 days more often than 7 or 10 days.
So:
AVG()reflects the real distribution of delivery times across all orders.AVG(DISTINCT)ignores frequency and averages all unique values equally.
If you use AVG(DISTINCT) without thinking, you might report to management “Our average delivery is about 6.4 days” when in reality, the average customer is seeing something closer to 5.7 days.
Takeaway
Using DISTINCT inside AVG() changes the story your data tells. Without it, you’re getting a weighted picture based on how often values occur. With it, you’re averaging unique values only, which can sometimes gloss over the patterns in your data.
As a rule of thumb:
- Use
AVG()for customer or transaction-level insights. - Use
AVG(DISTINCT)only if you specifically care about the variety of values, not their frequency.