Skip to content

Comments

feat: add regex pattern support for glob and exclude#336

Merged
jdx merged 18 commits intomainfrom
feat/regex-patterns
Oct 4, 2025
Merged

feat: add regex pattern support for glob and exclude#336
jdx merged 18 commits intomainfrom
feat/regex-patterns

Conversation

@jdx
Copy link
Owner

@jdx jdx commented Oct 4, 2025

Summary

Adds support for regex patterns in step-level glob and exclude fields, enabling more complex file filtering patterns beyond traditional glob patterns.

This addresses the need for advanced pattern matching, such as the example from the Airflow project where multiple complex exclusions needed to be combined into a single pattern.

Changes

Pkl Configuration

  • Added Regex typealias to pkl/Config.pkl as Mapping<"_type"|"pattern", String>
  • Updated Step.glob and Step.exclude to accept String | List<String> | Regex
  • Updated global Config.exclude to support regex patterns

Rust Implementation

  • Created Pattern enum with two variants:
    • Globs(Vec<String>) - traditional glob patterns (backward compatible)
    • Regex { _type: String, pattern: String } - new regex patterns
  • Implemented custom deserializer for Pattern to handle JSON from Pkl
  • Extended glob::get_pattern_matches() to support regex matching using the regex crate
  • Updated Step.filter_files() to use new pattern matching logic

Usage Example

exclude = new Mapping {
    ["_type"] = "regex"
    ["pattern"] = #"""
(?x)
^.*airflow\.template\.yaml$|
^.*init_git_sync\.template\.yaml$|
^chart/(?:templates|files)/.*\.yaml$|
^helm-tests/tests/chart_utils/keda\.sh_scaledobjects\.yaml$|
.*/v1.*\.yaml$|
^.*openapi.*\.yaml$|
^\.pre-commit-config\.yaml$|
^.*reproducible_build\.yaml$|
^.*pnpm-lock\.yaml$
"""#
}

Test Plan

Created comprehensive bats tests in test/regex_patterns.bats:

  • ✅ Regex exclude patterns - verifies complex exclusion patterns work correctly
  • ✅ Regex glob patterns - verifies regex can be used to match specific files
  • ✅ Regex with dir setting - verifies regex works when step has a working directory

All tests pass with both HK_LIBGIT2=0 and HK_LIBGIT2=1.

Backward Compatibility

This change is fully backward compatible:

  • Existing glob patterns (strings and lists) continue to work unchanged
  • The Pattern enum transparently handles both formats
  • No breaking changes to existing configurations

🤖 Generated with Claude Code


Note

Introduce regex support for glob/exclude, fix dir-scoped matching semantics, add new util subcommands, and update docs/tests accordingly.

  • Core/Engine:
    • Add Pattern enum (globs or regex) for Step.glob/Step.exclude; update matching via glob::get_pattern_matches using regex crate and strict dir-aware semantics.
    • Improve progress/template context: {{globs}} works for both glob and regex; progress shows pattern succinctly.
    • Fix dir-scoped matching (literal separators, pre-filter by dir) and contention detection to use new pattern API.
    • Enhance test runner output by including [before] command output.
  • Pkl Config:
    • Introduce RegexPattern and Regex() helper; allow String | List<String> | Regex for Step.glob, Step.exclude, and global exclude.
    • Adjust renderer to stop auto-list conversion for glob/exclude (now handled by Rust).
  • CLI Utilities:
    • Add subcommands: check-added-large-files, detect-private-key, no-commit-to-branch, python-check-ast, python-debug-statements.
    • no_commit_to_branch: use git symbolic-ref --short HEAD; tighten error construction.
  • Docs:
    • Update configuration docs with Regex support, examples, and notes; add new builtins sections for utilities.
  • Tests:
    • Add test/regex_patterns.bats and test/glob_dir_bug.bats; adjust test/dir.bats for **/*.html.
    • Update builtins tests for large-file and branch checks.
  • Deps:
    • Add regex crate dependency.

Written by Cursor Bugbot for commit 7431665. This will update automatically on new commits. Configure here.

jdx and others added 3 commits October 4, 2025 09:04
Adds support for regex patterns in step-level glob and exclude fields,
allowing more complex file filtering patterns beyond glob patterns.

Usage:
```pkl
exclude = new Mapping {
  ["_type"] = "regex"
  ["pattern"] = #".*\.test\.js$"#
}
```

The Pattern type in Rust now supports both glob patterns (as Vec<String>)
and regex patterns (as a Regex variant with a pattern string). The
implementation uses the regex crate for pattern matching.

Changes:
- Added Regex typealias to pkl/Config.pkl
- Updated Step.glob and Step.exclude to accept Pattern enum
- Implemented Pattern enum with Globs and Regex variants
- Added custom deserializer for Pattern to handle both formats
- Extended glob::get_pattern_matches() to support regex matching
- Added comprehensive bats tests for regex patterns

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Documents the new regex pattern support for glob and exclude fields
in the configuration documentation. Includes examples for both step-level
and global exclude patterns.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Replace complex domain-specific examples with simpler, more relatable
use cases:
- Test/spec file exclusion
- Config file matching
- Common build/vendor directory exclusion
- Minified/generated file patterns

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Due to Pkl's type system limitations, we can't create a Regex() function
that works directly in union types. The new Mapping syntax is required.

However, users can create their own local helper function to simplify
the syntax in their own hk.pkl file:

```pkl
local Regex = (pattern) -> new Mapping {
    ["_type"] = "regex"
    ["pattern"] = pattern
}
```

Added documentation showing this pattern as an optional convenience.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@jdx jdx force-pushed the feat/regex-patterns branch from 9ea2fb4 to d929594 Compare October 4, 2025 14:19
When dir is set, regex patterns in glob and exclude are now applied
to paths relative to that directory, matching the behavior of glob
patterns. Previously, regex patterns were applied to full paths,
causing unexpected matches or misses.

For example, with dir="src" and exclude regex "^test\.js$":
- Before: Would not match "src/test.js" (regex applied to full path)
- After: Correctly matches "src/test.js" (regex applied to "test.js")

Changes:
- Added dir parameter to glob::get_pattern_matches()
- Strip dir prefix from paths before applying regex patterns
- Updated all call sites to pass dir when available
- Added test for nested directory exclusion patterns
- Fixed Pkl type annotations (Regex -> RegexPattern)

💘 Generated with Crush
Co-Authored-By: Crush <[email protected]>
cursor[bot]

This comment was marked as outdated.

jdx and others added 4 commits October 4, 2025 10:12
Replace all instances of the `new Mapping { ["_type"] = "regex" ... }`
syntax with the cleaner `Config.Regex()` helper function.

Changes:
- Updated docs/configuration.md to import Config.pkl and use Config.Regex()
- Updated test/regex_patterns.bats to use Config.Regex() in all test cases
- Updated helper function documentation to show proper import usage

The Regex() helper must be accessed via Config.Regex() since Pkl functions
aren't automatically inherited by amending modules.

💖 Generated with Crush
Co-Authored-By: Crush <[email protected]>
The Pkl converter was converting single strings to lists for glob and
exclude fields, but this is now redundant. The Rust Pattern deserializer
already handles:
- Single strings (converted to Pattern::Globs(vec![s]))
- Arrays of strings (converted to Pattern::Globs(vec))
- Regex objects with _type and pattern fields

Removing this conversion simplifies the code and makes the converter
logic clearer.

💖 Generated with Crush
Co-Authored-By: Crush <[email protected]>
The StepTest class doesn't have a `check` or `fix` property. Tests
that need to run with custom command arguments cannot override the
step's command. This commit removes test cases that were trying to
use this unsupported feature.

The removed tests were:
- check_added_large_files: tests with custom --maxkb limits
- no_commit_to_branch: test with custom branch list

These tests can be re-added once StepTest supports per-test command
overrides, or the utility commands support configuration via
environment variables.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Two issues fixed:

1. Glob pattern scoping in files_in_contention:
   - get_pattern_matches now properly handles the dir parameter for
     Pattern::Globs by prefixing globs with the directory path
   - This ensures consistent behavior with Step.filter_files and
     prevents incorrect file contention detection

2. User config regex pattern support:
   - Changed UserStepConfig.glob and UserStepConfig.exclude from
     StringOrList to Pattern
   - This allows user configurations (via .hkrc.toml) to use regex
     patterns, matching the functionality available in project configs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Added tests for:
1. Glob pattern scoping with dir parameter - verifies that glob
   patterns are properly scoped to step directories in files_in_
   contention detection
2. Additional regex pattern tests with directories and nested paths

Note: User config tests with .hkrc.pkl would require a separate Pkl
schema for UserConfig which doesn't currently exist. The Rust changes
to support Pattern in UserStepConfig are verified through unit tests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
cursor[bot]

This comment was marked as outdated.

This commit addresses several issues with glob and regex pattern matching:

1. **Pre-filter files by dir**: get_pattern_matches now pre-filters files
   by the dir parameter before applying pattern matching. This ensures
   files outside the dir are excluded, and regex patterns match against
   paths relative to the dir.

2. **Filter files when dir is set without glob**: In step_group.rs,
   steps with dir but no glob pattern now correctly filter files to
   that directory. Previously, all files were returned.

3. **Populate {{globs}} for regex patterns**: The {{globs}} template
   variable is now populated with the regex pattern string for
   Pattern::Regex, instead of being left empty. This allows templates
   to access pattern information regardless of pattern type.

Added comprehensive bats tests covering:
- Dir pre-filtering for both glob and regex patterns
- Steps with dir but no glob pattern
- {{globs}} template variable with regex patterns
- Regex patterns matching relative paths when dir is set

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
cursor[bot]

This comment was marked as outdated.

jdx and others added 4 commits October 4, 2025 12:06
Added documentation for 5 builtins that were missing from docs/builtins.md:
- check_added_large_files: Prevent committing large files (default limit: 500KB)
- detect_private_key: Detect accidentally committed private keys
- no_commit_to_branch: Prevent direct commits to protected branches
- python_check_ast: Validate Python syntax by parsing the AST
- python_debug_statements: Detect debug statements (pdb, breakpoint) in Python code

This fixes the docs:sync CI check failure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…n matching

This commit fixes two related issues with glob and regex pattern matching:

1. **Glob pattern path separator handling**: When a `dir` parameter is set,
   glob patterns now use strict path separator semantics (literal_separator=true).
   This means `*.js` only matches files directly in the directory, not in
   subdirectories. Without `dir`, globs maintain backward-compatible behavior
   where `*` can match across path separators.

2. **Simplified and consistent pattern handling**: Refactored both `step.rs`
   and `step_group.rs` to use `get_pattern_matches` consistently for all
   pattern types (glob and regex). This eliminates code duplication and ensures
   consistent behavior.

Changes:
- Added `get_matches_strict()` function for strict glob matching with dir
- Simplified `filter_files()` to use `get_pattern_matches` for both globs and regex
- Simplified `step_group.rs` to avoid redundant dir filtering
- Added test cases for glob separator behavior

Added comprehensive bats tests:
- Test that {{globs}} template variable is consistent (string format for both types)
- Test that glob patterns with dir properly respect path separators

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
This fixes the double-application of directory context for glob patterns.
Previously, when dir was set, the function would:
1. Pre-filter files to those starting with dir
2. Prefix the glob pattern with dir
3. Match the prefixed pattern (e.g., "src/*.js") against the full paths

This meant the dir context was applied twice, which was inconsistent with
how regex patterns work (they match against paths relative to dir).

Now glob patterns behave consistently with regex:
- Pre-filter files to dir
- Strip dir prefix to get relative paths
- Match pattern against relative paths
- Convert matched paths back to full paths

This ensures both glob and regex patterns have the same semantics when
dir is specified.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@jdx jdx enabled auto-merge (squash) October 4, 2025 18:36
cursor[bot]

This comment was marked as outdated.

This reverts the problematic path manipulation introduced in commit 915223d
which broke glob matching semantics and file contention detection.

**The Bug:**
Commit 915223d changed `get_pattern_matches` to strip directory prefixes,
match against relative paths, then reconstruct full paths. This caused:

1. **Breaking semantics**: Existing patterns that relied on full path
   matching stopped working correctly
2. **Path reconstruction errors**: The join operation could create incorrect
   paths in edge cases
3. **File contention detection failures**: The code expects full paths for
   lock coordination, but received relative paths, breaking the HashSet
   comparison in step.rs:394

**The Fix:**
Reverted to the correct approach from commit 9d46cb0:
- Pre-filter files to those in `dir`
- Prefix glob patterns with `dir` (e.g., `*.rs` → `src/*.rs`)
- Match against full paths
- Return full paths consistently

This ensures file paths remain consistent throughout the system and file
locking/contention detection works correctly.

**Test Coverage:**
Added test/glob_dir_bug.bats with tests for:
- Glob path semantics with dir parameter
- Pattern matching with nested directories
- File contention detection with dir-scoped steps

All existing tests pass (299 tests total).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@jdx jdx disabled auto-merge October 4, 2025 18:54
This commit fixes all CI test failures on the feat/regex-patterns branch:

**Missing Util Commands:**
- Registered 5 util commands that existed as Rust files but weren't accessible:
  - check-added-large-files
  - detect-private-key
  - no-commit-to-branch
  - python-check-ast
  - python-debug-statements
- Added module imports, enum variants, and match arms in src/cli/util/mod.rs

**no_commit_to_branch Fixes:**
- Changed from `git rev-parse --abbrev-ref HEAD` to `git symbolic-ref --short HEAD`
  to support repos without commits (src/cli/util/no_commit_to_branch.rs:38)
- Fixed test to use sandbox by adding `files = List("{{tmp}}/.gitkeep")`
  and initializing git repo in `before` command (pkl/builtins/no_commit_to_branch.pkl)

**Test Fixes:**
- Updated dir.bats to use `**/*.html` and `**/*.ts` instead of `*.html` and `*.ts`
  for correct recursive matching with literal_separator=true (test/dir.bats:20)

**Test Runner Enhancement:**
- Added capture and display of `before` command output in test results
  for easier debugging (src/test_runner.rs:177-278)

All 299 tests now pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@jdx jdx enabled auto-merge (squash) October 4, 2025 19:19
- Use std::io::Error::other() instead of Error::new(ErrorKind::Other, ...)
- Remove unnecessary reference in .args(&[...]) -> .args([...])

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@jdx jdx merged commit 8798445 into main Oct 4, 2025
6 checks passed
@jdx jdx deleted the feat/regex-patterns branch October 4, 2025 19:27
@jdx jdx mentioned this pull request Oct 4, 2025
jdx added a commit that referenced this pull request Oct 5, 2025
## [1.17.0](https://github.com/jdx/hk/compare/v1.16.0..v1.17.0) -
2025-10-05

### 🚀 Features

- Add hk util trailing-whitespace command by
[@jdx](https://github.com/jdx) in
[#319](#319)
- add mixed_line_ending builtin by [@jdx](https://github.com/jdx) in
[#324](#324)
- add check_symlinks builtin by [@jdx](https://github.com/jdx) in
[#326](#326)
- add check_executables_have_shebangs builtin by
[@jdx](https://github.com/jdx) in
[#325](#325)
- Add check-merge-conflict util command and builtin by
[@jdx](https://github.com/jdx) in
[#322](#322)
- add check_case_conflict builtin by [@jdx](https://github.com/jdx) in
[#323](#323)
- add detect_private_key builtin by [@jdx](https://github.com/jdx) in
[#332](#332)
- add check_added_large_files builtin by [@jdx](https://github.com/jdx)
in [#329](#329)
- add python_debug_statements builtin by [@jdx](https://github.com/jdx)
in [#331](#331)
- add python_check_ast builtin by [@jdx](https://github.com/jdx) in
[#330](#330)
- add no_commit_to_branch builtin by [@jdx](https://github.com/jdx) in
[#333](#333)
- add check_byte_order_marker and fix_byte_order_marker builtins by
[@jdx](https://github.com/jdx) in
[#328](#328)
- add regex pattern support for glob and exclude by
[@jdx](https://github.com/jdx) in
[#336](#336)
- automatically batch large file lists to prevent ARG_MAX errors by
[@jdx](https://github.com/jdx) in
[#338](#338)

### 🐛 Bug Fixes

- Add validation for stage attribute requiring fix command by
[@jdx](https://github.com/jdx) in
[#327](#327)
- display stderr when check_list_files returns empty list by
[@jdx](https://github.com/jdx) in
[#334](#334)
- added new builtins to Builtins.pkl by [@jdx](https://github.com/jdx)
in
[b8a2b17](b8a2b17)
- enable experimental settings in mise.toml for swift support by
[@jdx](https://github.com/jdx) in
[#342](#342)
- correct airflow migration test to expect local imports by
[@jdx](https://github.com/jdx) in
[#343](#343)
- make final CI check always run and fail if dependencies fail by
[@jdx](https://github.com/jdx) in
[#344](#344)
- add ruff format to ruff builtin by [@jdx](https://github.com/jdx) in
[#340](#340)

### 🚜 Refactor

- Split util module into separate files by
[@jdx](https://github.com/jdx) in
[#321](#321)

### 🛡️ Security

- migrate pre-commit by [@jdx](https://github.com/jdx) in
[#318](#318)

### 🔍 Other Changes

- split CI runs into parallel jobs and add docs-sync mise task by
[@jdx](https://github.com/jdx) in
[#337](#337)
- remove v0 pkl files from docs/public by [@jdx](https://github.com/jdx)
in [#341](#341)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Release 1.17.0 with new migrate command, many util subcommands,
refreshed docs, and dependency updates.
> 
> - **Version**: bump `hk` to `1.17.0` (Cargo.toml, usage specs, docs).
> - **CLI**:
> - **New Command**: `migrate pre-commit` with flags (`--config`,
`--output`, `--force`, `--hk-pkl-root`).
> - **Util Subcommands**: add `check-added-large-files`,
`check-byte-order-marker`, `fix-byte-order-marker`,
`check-case-conflict`, `check-executables-have-shebangs`,
`check-merge-conflict`, `check-symlinks`, `detect-private-key`,
`end-of-file-fixer`, `mixed-line-ending`, `no-commit-to-branch`,
`python-check-ast`, `python-debug-statements`, `trailing-whitespace`.
> - **Docs**:
> - Update `docs/cli/index.md`, regenerate `docs/cli/commands.json`, and
`hk.usage.kdl`.
>   - Split `util` docs into per-command pages and add `migrate` docs.
> - **Dependencies**: update `Cargo.lock` crate versions and set `hk`
crate version to `1.17.0`.
> - **Changelog**: add `CHANGELOG.md` entry for `1.17.0` with
features/bug fixes.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
75b972a. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: mise-en-dev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant