UNION forces a sort-and-dedupe pass; UNION ALL concatenates. On large
result sets the difference is often an order of magnitude. In most real
queries people write UNION out of habit when UNION ALL is correct -
either because the two sides are already disjoint (e.g. different tables,
different date ranges) or because duplicate rows aren't possible given the
schema.
A W011 rule warns on every UNION (not UNION ALL) so the author has
to actively justify the dedupe cost.
Should fail
SELECT id FROM orders_2024
UNION
SELECT id FROM orders_2025;
Should pass
SELECT id FROM orders_2024
UNION ALL
SELECT id FROM orders_2025;
Implementation hints
- Mirror
W005 subquery-in-where in sql_guard/rules/warnings.py - it's
a single-statement multiline rule.
- Regex:
r"\bUNION\b(?!\s+ALL\b)"
- Severity:
warning.
- Add to
ALL_RULES in sql_guard/rules/__init__.py.
- Test: copy the shape of
test_rules.py::test_missing_limit - one
triggering input, one non-triggering input.
Estimated LOC: ~25 code + ~15 test.
UNIONforces a sort-and-dedupe pass;UNION ALLconcatenates. On largeresult sets the difference is often an order of magnitude. In most real
queries people write
UNIONout of habit whenUNION ALLis correct -either because the two sides are already disjoint (e.g. different tables,
different date ranges) or because duplicate rows aren't possible given the
schema.
A
W011rule warns on everyUNION(notUNION ALL) so the author hasto actively justify the dedupe cost.
Should fail
Should pass
Implementation hints
W005 subquery-in-whereinsql_guard/rules/warnings.py- it'sa single-statement multiline rule.
r"\bUNION\b(?!\s+ALL\b)"warning.ALL_RULESinsql_guard/rules/__init__.py.test_rules.py::test_missing_limit- onetriggering input, one non-triggering input.
Estimated LOC: ~25 code + ~15 test.