Skip to content

feat(rules): warn on OVER ()#21

Merged
Pawansingh3889 merged 25 commits intoPawansingh3889:mainfrom
Prabhu-1409:feature/warn_on_over
Apr 26, 2026
Merged

feat(rules): warn on OVER ()#21
Pawansingh3889 merged 25 commits intoPawansingh3889:mainfrom
Prabhu-1409:feature/warn_on_over

Conversation

@Prabhu-1409
Copy link
Copy Markdown
Contributor

What changed

Added a new warning rule W014 to detect window functions using OVER() without ORDER BY or PARTITION BY. This helps flag potentially non-deterministic queries and improves readability of SQL.

Related issue

Closes #9

If this adds or changes a rule

  • Rule ID is the next unused one in its family (E00N / W0NN / S00N / P00N)
  • Rule class registered in sql_guard/rules/__init__.py
  • Fixture line in tests/fixtures/errors.sql or warnings.sql
  • "Fires on bad SQL" test in tests/test_rules.py
  • "Does NOT fire on safe SQL" test (using tmp_path)
  • README rule table + Key Numbers count updated
  • Rule is not a removal or rename of an existing rule (breaking change — see GOVERNANCE.md)

Tests

  • pytest -q passes
  • ruff check . passes

Checklist

  • One logical change per commit, conventional commit style (feat:, fix:, docs:, chore:, ci:, test:, style:)
  • CHANGELOG.md [Unreleased] entry if user-facing

Anything else reviewers should know

Copy link
Copy Markdown
Owner

@Pawansingh3889 Pawansingh3889 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Prabhu. Template's filled and CI's clean. A few things to address before merge.

1. Rebase needed. Tonight's v0.5.0 release (#24) added four T-SQL rules (T001-T004) and a W002/W006 regex tweak, so ALL_RULES count and the test_rules.py registry tests have moved. Your branch is still at the pre-v0.5.0 state, so there will be conflicts on sql_guard/rules/__init__.py and tests/test_rules.py.

2. Rule ID: W013, not W014. Current highest W is W012 (group-by-ordinal), so W014 leaves a gap. Note that issue #9 suggests the S-prefix (S004) on the grounds that this is a structural pattern rather than a style warning. Either W013 or S004 works, pick one.

3. Semantic gap with issue #9. The issue body asks for "warn on OVER () without PARTITION BY" and the only "should fail" example is ROW_NUMBER() OVER (). Your implementation is looser: it passes anything with ORDER BY inside, so OVER (ORDER BY event_time) with no PARTITION BY slips through even though it still computes over the whole result set. Please either tighten the rule to match the issue (require PARTITION BY), or confirm in a comment that we are intentionally widening the scope, in which case the rule name should move away from window-missing-order-partition to something like empty-over-clause to match what it actually detects.

4. Regex and additional tests. The [^)]* inside _valid_pattern stops at the first ), so nested parens inside OVER could cause edge-case behaviour. Two tests worth adding to pin down the behaviour: verify OVER (PARTITION BY user_id) alone passes, and verify OVER with a nested function call but no ORDER / PARTITION fires. A third test on OVER (ORDER BY event_time) alone depends on how you resolve item 3.

5. Minor cleanups.

  • tests/test_python_scanner.py has an unrelated blank-line deletion between test_extract_captures_variable_assignment and test_invalid_python_returns_empty. Please revert.
  • sql_guard/rules/warnings.py is missing a trailing newline after the new class.

Happy to re-review once these are in.

@Prabhu-1409 Prabhu-1409 force-pushed the feature/warn_on_over branch from fd9b4cf to ea832ac Compare April 20, 2026 16:44
@Pawansingh3889
Copy link
Copy Markdown
Owner

Thanks Prabhu, the W014 → W013 change came through cleanly, the rule now correctly fires on any OVER without PARTITION BY (matching the issue), and the unrelated test_python_scanner.py change is reverted. CI is green across all four Python versions. Good iteration.

Two small things still pending and one mechanical one before I can merge:

  1. The branch needs another rebase — since our last exchange, feat(rules): add P005 sqlalchemy-text-fstring rule #25 (P005) and feat(rules): add W016 not-in-with-subquery rule #26 (W016) also landed on main and touch the same five files yours does. I'd rather you do this locally than risk the web editor on warnings.py and test_rules.py.

  2. sql_guard/rules/warnings.py is still missing the trailing newline at end of file. One-line fix during the rebase.

  3. Optional but nice: a third test verifying the nested-parens edge case (e.g. ROW_NUMBER() OVER (ORDER BY DATE_TRUNC('hour', ts))) — this confirms the new regex handles inner function calls without false positives. Happy to skip if you'd rather keep this PR tight; we can always add it as a follow-up.

Steps:

git fetch origin
git rebase origin/main
Resolve conflicts in the five files: CHANGELOG.md, README.md, sql_guard/rules/warnings.py, tests/fixtures/warnings.sql, tests/test_rules.py. Add the trailing newline to warnings.py while you're in there.
pytest -q
ruff check .
git push --force-with-lease

Once green I'll merge same day. Thanks again for the careful work on the rule semantics — the rename and the tightening are exactly right.

Pawansingh3889 and others added 19 commits April 26, 2026 08:04
- E007 alter-add-not-null-no-default: forces table rewrite under Sch-M lock
- E008 drop-column: irreversible, breaks replication subscribers and rollback
- W017 leading-wildcard-like: LIKE '%foo' defeats B-tree indexes
- W018 or-across-columns: WHERE a=1 OR b=2 defeats single-column indexes
- W020 truncate-table: bypasses DELETE triggers and resets identity
- T005 create-index-without-online: builds without WITH (ONLINE = ON) lock readers
Same-line and disable-next-line forms silence rules per occurrence so
known false positives don't need a project-wide disable. Recognises
'-- sql-guard: disable=W001,W003' (SQL) and '# sql-guard: disable=...'
(Python). Bare 'disable' with no rule list silences every rule on the
line, mirroring '# noqa' from flake8/ruff.
Loader walks up from cwd looking for .sql-guard.yml (or .sql-guard.yaml).
Recognises disable, ignore, include_python, severity. Returns an empty
Config when no file is present so callers don't have to None-check.
CLI flags merge with and override config values.

PyYAML promoted from optional to a runtime dependency.
Calls git diff --name-only against the working tree (or against a base
ref via --changed-base) and intersects with the discovered file list.
Falls back to a full scan with a warning when the cwd isn't a git repo
or git is unavailable. Speeds up pre-commit hooks on large codebases.
Builds a SARIF document compatible with the
github/codeql-action/upload-sarif action so findings render inline on
PRs in the GitHub Files Changed view. The full rule catalogue ships in
the tool descriptor; results carry rule id, level, file, line, and the
suggestion appended to the message text. Lines below 1 are clamped to 1
to satisfy the schema.
- --config / -c PATH overrides automatic .sql-guard.yml discovery
- --changed-only filters discovered files via git diff (with optional
  --changed-base REF). Empty diff exits cleanly without a 'no issues'
  scan banner.
- --format sarif emits a SARIF 2.1.0 doc to stdout, or to a file via
  --output / -o.
- --disable rule IDs are uppercased so 'w001' and 'W001' both work.
- Config file disable / ignore / include_python merge with CLI flags.
Key Numbers updated to 37 rules / 143 tests / v0.6.0. Rule tables grow
to include E007, E008, W017, W018, W020, T005. New Configuration
sections cover .sql-guard.yml, inline disable comments, --changed-only,
and SARIF output (with a copy-paste GitHub Actions snippet).
CHANGELOG covers the 6 new rules, inline disable directives,
.sql-guard.yml config, --changed-only flag, SARIF reporter, and the
PyYAML dependency promotion. P005 / W016 entries that were sitting in
[Unreleased] are folded into the 0.6.0 release notes.
v0.6.0 declares pyyaml as a runtime dependency. micropip's resolver
sometimes fails to install it cleanly because the only PyYAML build
that works under Pyodide is the bundled one served from
pyodide-cdn.s3, not the PyPI sdist. Pre-loading via
loadPackage(['micropip', 'pyyaml']) takes the bundled emscripten
build directly and unblocks the subsequent micropip.install('sql-sop').

Also reordered: install sqlparse first (so the sql-sop resolve doesn't
fan out and re-download it as a transitive dep) and dropped the
trailing standalone sqlparse install since it now happens earlier in
the chain.

Status messages updated to surface the new two-stage init.
The catch-block error message used '\'t' (escaped backslash + apostrophe)
inside a single-quoted string literal. JS parsed the backslash as a
literal backslash, then took the apostrophe as the string terminator,
then choked on 't start the Python runtime' as an unexpected identifier.

Devtools showed it as: Uncaught SyntaxError: Unexpected identifier 't'
at sql-guard/:125

A SyntaxError on any line of an inline script kills the whole script
before any of it executes. So boot() never ran and the page sat at the
initial HTML strings ('Loading Python runtime...', 'Waiting for Pyodide
to load...') forever, regardless of CDN state, regardless of what
fixes we'd pushed elsewhere in the file.

Replace the contraction with 'Could not' to sidestep the apostrophe
entirely. Less fragile than re-escaping.
pypistats.org/packages/sql-sop confirms 530 downloads in the last 30
days as of 2026-04-26 (vs 222 in the last 7 days, 2.6k all-time per
pepy.tech). The 195+ figure was several months stale; rounding down
to 500+ stays conservative and verifiable.
Resolves Pawansingh3889#7. (Issue title says W017 but W017 is now leading-wildcard-like
in v0.6.0; W019 is the next free ID before W020.)

Warns on COUNT(DISTINCT col) without a WHERE, GROUP BY, or LIMIT in the
same statement -- forces a full sort + distinct pass and is a frequent
perf surprise on prod.

- 'SELECT COUNT(DISTINCT user_id) FROM events;' -> fires
- 'SELECT COUNT(DISTINCT user_id) FROM events WHERE event_date >= ...' -> passes
- 'SELECT tenant_id, COUNT(DISTINCT user_id) FROM events GROUP BY tenant_id;' -> passes
- 'SELECT COUNT(*) FROM events;' -> not flagged

Updated:
- sql_guard/rules/warnings.py: new CountDistinctUnbounded rule.
- sql_guard/rules/__init__.py: registered in ALL_RULES.
- tests/test_new_rules.py: 6 new test cases covering hit, WHERE, GROUP
  BY, LIMIT, whitespace handling, and plain COUNT non-firing.
- tests/test_rules.py: bumped expected ALL_RULES count 32->33 and
  warning count 22->23.
- CHANGELOG.md: Unreleased entry.

Co-authored-by: Matt Van Horn <[email protected]>
Single-rule patch release. W019 lands count-distinct-unbounded from
PR Pawansingh3889#29 (mvanhorn). Bypass list recognises WHERE / GROUP BY / LIMIT
plus T-SQL TOP / FETCH FIRST / FETCH NEXT.

Surface counts refreshed: 38 rules (33 SQL + 5 Python), 149 tests.
Trusted Publishing will fire on the v0.6.1 tag push.
@Prabhu-1409
Copy link
Copy Markdown
Contributor Author

Requested changes are done.

@Pawansingh3889
Copy link
Copy Markdown
Owner

CI's flagging two things, only one of which is yours.

The error: [Unreleased] in CHANGELOG.md is empty after the rebase (v0.6.0 and v0.6.1 swept what was there into their own sections). Add a bullet under ### Added covering W013 and CI'll go green.

Suggested entry to drop under ## [Unreleased] -> ### Added:

  • W013 window-without-partition - warns on window functions
    using OVER () without PARTITION BY, flagging non-deterministic
    ordering and full-result-set scans.

The two rev-pin warnings (v0.5.0 vs latest tag v0.6.1 in README.md and .pre-commit-config.yaml) are pre-existing stale pins on main, not from your PR. I just pushed the bump on main, so once you rebase (or after merge) they'll resolve.

Trailing newline check on warnings.py: I'll confirm on the diff once CI is green. Push the CHANGELOG line and ping me.

@Prabhu-1409
Copy link
Copy Markdown
Contributor Author

Changes are done.

@Pawansingh3889
Copy link
Copy Markdown
Owner

Thanks Prabhu! 🙌 The CHANGELOG entry is perfect and all 6 checks are green. Nice clean rebase across the five files.

One tiny thing I caught when I checked locally: sql_guard/rules/warnings.py is still missing the trailing newline at end of file. The diff doesn't surface this clearly so it's easy to miss. I confirmed by looking at the last byte of the file on your branch (still no \n).

If you've got 30 seconds, easiest is for you to add the newline and force-push. Otherwise I can push the one-byte fix as a maintainer, completely fine if you'd rather. Just let me know.

Either way we're merging this today 🚀. Genuinely appreciate the patience across three review rounds, the rule is clean and the test coverage is exactly right.

@Prabhu-1409
Copy link
Copy Markdown
Contributor Author

new trailing line added to warning.py.

@Pawansingh3889
Copy link
Copy Markdown
Owner

Confirmed locally, last byte of warnings.py is \n now . Squash-merging.

Brilliant work Prabhu. Nice rule, careful semantics, clean rebases through three rounds. The OVER () pattern was a real gap in the catalogue. 🎉

Looking forward to the next one.

@Pawansingh3889 Pawansingh3889 merged commit 6240050 into Pawansingh3889:main Apr 26, 2026
6 checks passed
Pawansingh3889 added a commit that referenced this pull request Apr 27, 2026
Single-rule patch release. W013 window-without-partition lands from
PR #21 (@Prabhu-1409). Dialect-aware messaging across Postgres and
Redshift, three review rounds, well-tested.

Repository scaffolding shipped alongside (not user-facing in the
package, but visible to contributors):
- ROADMAP.md with v0.7 Performance Rules Pack milestone scope
- README Contributors section (currently @tmchow, @mvanhorn,
  @Prabhu-1409)
- scripts/scaffold_rule.py for new-rule boilerplate

39 rules total (34 SQL + 5 Python), 152 tests. Trusted Publishing
fires on the v0.6.2 tag push.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rule: S004 window-without-partition - warn on OVER () without PARTITION BY

3 participants