Skip to content

feat(rules): add W019 count-distinct-unbounded rule#29

Merged
Pawansingh3889 merged 1 commit intoPawansingh3889:mainfrom
mvanhorn:osc/issue-7-w017
Apr 26, 2026
Merged

feat(rules): add W019 count-distinct-unbounded rule#29
Pawansingh3889 merged 1 commit intoPawansingh3889:mainfrom
mvanhorn:osc/issue-7-w017

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Resolves #7.

The issue title pre-dates v0.6.0; W017 is now leading-wildcard-like. This rule lands as W019, the next free ID before W020.

Warns on COUNT(DISTINCT col) without WHERE, GROUP BY, or LIMIT in the same statement. That pattern forces the engine into a full sort + distinct pass over the whole table, which is the perf surprise the issue flags.

Behaviour

SQL Verdict
SELECT COUNT(DISTINCT user_id) FROM events; fires
SELECT COUNT(DISTINCT user_id) FROM events WHERE event_date >= '2024-01-01'; passes
SELECT tenant_id, COUNT(DISTINCT user_id) FROM events GROUP BY tenant_id; passes
SELECT COUNT(DISTINCT user_id) FROM events LIMIT 1; passes
SELECT COUNT(*) FROM events; / SELECT COUNT(user_id) FROM events; not flagged

Whitespace inside the call (COUNT ( DISTINCT user_id)) is tolerated.

Tests

6 new cases in tests/test_new_rules.py covering the matrix above, plus the registry-count tests in tests/test_rules.py were bumped from 22 -> 23 warnings (32 -> 33 total). pytest tests/ is 127 passed, 1 skipped.

Resolves Pawansingh3889#7. (Issue title says W017 but W017 is now leading-wildcard-like
in v0.6.0; W019 is the next free ID before W020.)

Warns on COUNT(DISTINCT col) without a WHERE, GROUP BY, or LIMIT in the
same statement -- forces a full sort + distinct pass and is a frequent
perf surprise on prod.

- 'SELECT COUNT(DISTINCT user_id) FROM events;' -> fires
- 'SELECT COUNT(DISTINCT user_id) FROM events WHERE event_date >= ...' -> passes
- 'SELECT tenant_id, COUNT(DISTINCT user_id) FROM events GROUP BY tenant_id;' -> passes
- 'SELECT COUNT(*) FROM events;' -> not flagged

Updated:
- sql_guard/rules/warnings.py: new CountDistinctUnbounded rule.
- sql_guard/rules/__init__.py: registered in ALL_RULES.
- tests/test_new_rules.py: 6 new test cases covering hit, WHERE, GROUP
  BY, LIMIT, whitespace handling, and plain COUNT non-firing.
- tests/test_rules.py: bumped expected ALL_RULES count 32->33 and
  warning count 22->23.
- CHANGELOG.md: Unreleased entry.
@codecov-commenter
Copy link
Copy Markdown

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

@Pawansingh3889 Pawansingh3889 changed the title feat(rules): W019 count-distinct-unbounded feat(rules): add W019 count-distinct-unbounded rule Apr 26, 2026
@Pawansingh3889
Copy link
Copy Markdown
Owner

🎉 second W-rule from you this week, nice work 👏 thanks for picking this back up after W017 got taken in v0.6.0.

landing as W019 makes sense, and the TOP / FETCH NEXT bypass is exactly what the T-SQL crowd needs.

subquery and CTE bodies still slip past, but that's a regex limitation. fine for now.

squashing 🚀

@Pawansingh3889 Pawansingh3889 merged commit ed4ba15 into Pawansingh3889:main Apr 26, 2026
7 of 8 checks passed
Pawansingh3889 added a commit that referenced this pull request Apr 26, 2026
Single-rule patch release. W019 lands count-distinct-unbounded from
PR #29 (mvanhorn). Bypass list recognises WHERE / GROUP BY / LIMIT
plus T-SQL TOP / FETCH FIRST / FETCH NEXT.

Surface counts refreshed: 38 rules (33 SQL + 5 Python), 149 tests.
Trusted Publishing will fire on the v0.6.1 tag push.
@mvanhorn
Copy link
Copy Markdown
Contributor Author

Appreciate getting this rule in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rule: W017 count-distinct-unbounded - warn on COUNT(DISTINCT col) without LIMIT/filter

3 participants