Skip to content

Zod parser pattern matching rewrite#45

Merged
dannysmith merged 13 commits intomainfrom
zod-parser-pattern-matching-rewrite
Oct 24, 2025
Merged

Zod parser pattern matching rewrite#45
dannysmith merged 13 commits intomainfrom
zod-parser-pattern-matching-rewrite

Conversation

@dannysmith
Copy link
Copy Markdown
Owner

@dannysmith dannysmith commented Oct 24, 2025

Closes #40 and enables support for image and reference fields in the sidebar in nested objects.

Summary by CodeRabbit

  • New Features

    • Added validation constraints (length, range) to collection fields for stricter content validation.
    • Introduced new fields to content collections: coverAlt, tags, platform, author, and relatedArticles in articles collection; enhanced metadata with priority constraints in notes collection.
  • Documentation

    • Updated schema architecture documentation to clarify JSON schemas as the primary source of truth with enhanced validation capabilities.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 24, 2025

Walkthrough

The pull request implements a pattern-matching rewrite of the Zod schema parser, replacing line-based field extraction with targeted helper detection. Changes include parser refactoring with new data structures and functions, documentation updates reflecting the architectural shift, and test schema constraint additions to support the new flow.

Changes

Cohort / File(s) Summary
Documentation Architecture Updates
docs/developer/schema-system.md
Reframes narrative from "Rust backend is sole source" to "JSON schemas are primary source with Zod-based enhancements"; introduces "How to Add New Astro Helpers" and "Implementation Reference" sections; documents multi-step merge pipeline with helper-detection mechanism via JSON and Zod parsing.
Task Documentation
docs/tasks-done/task-zod-parser-pattern-matching-rewrite.md, docs/tasks-todo/task-2-zod-parser-pattern-matching-rewrite.md
Moves task-2 from tasks-todo to tasks-done, converting it into comprehensive rewrite plan spanning six phases (test audit through cleanup) with pattern-matching approach, helper detection, path resolution, and structured testing strategy; removed prior task placeholder.
Parser Pattern-Matching Refactor
src-tauri/src/parser.rs
Replaces line-based Zod field parsing with pattern-matching approach; adds public enum HelperType (Image, Reference) and struct HelperMatch; introduces functions find_helper_calls(), resolve_field_path(), is_inside_array(), and extract_zod_special_fields() for targeted helper discovery and path resolution.
Test Schema Constraints
test/dummy-astro-project/.astro/collections/articles.schema.json, test/dummy-astro-project/.astro/collections/notes.schema.json
Adds minLength/maxLength constraints to string fields (title, slug, description, alt) and minimum/maximum to numeric fields; introduces descriptive metadata (description, markdownDescription) for field validation and UI hints.
Content Configuration Updates
test/dummy-astro-project/src/content.config.ts
Expands articles and notes schema definitions with tightened constraints (min/max lengths); adds new fields to articles (coverAlt, tags, platform, author reference, relatedArticles); augments notes metadata with descriptions for category and priority; maintains backward compatibility with optional fields.

Sequence Diagram

sequenceDiagram
    autonumber
    
    participant old as Old Parser
    participant new as New Parser
    participant schema as JSON Schema
    participant merger as Schema Merger
    participant frontend as Frontend
    
    rect rgb(230, 245, 255)
    Note over old: Line-by-Line Parsing
    old->>old: Parse each line for Zod fields<br/>Count braces for context
    old->>old: Extract field definitions<br/>Heavy nested logic
    end
    
    rect rgb(240, 255, 245)
    Note over new: Pattern-Matching Approach
    new->>new: Pass 1: Find helper calls<br/>image() and reference()
    new->>new: Pass 2: Resolve field paths<br/>Trace backwards through schema
    new->>new: Produce helper metadata<br/>HelperMatch with position
    end
    
    new->>schema: Combine with JSON schema
    schema->>merger: Full schema + helpers
    merger->>frontend: Merged output with<br/>field type annotations
    frontend->>frontend: Render ImageField/ReferenceField<br/>components based on helpers
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

The parser refactoring introduces new data structures and multiple helper functions with logic density around path resolution and brace-level tracking, requiring careful validation of correctness. Schema and configuration changes follow repetitive constraint-addition patterns, reducing per-file complexity. Documentation updates are informational rather than logic-dependent.

Poem

🐰 Pattern-matching swiftly finds each special call,
No more line-by-line through brackets, braces tall,
Image and reference leap from the schema text,
Field paths traced backward—the parser's correct!
One pass, then another, the helpers now shine,

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning Several changes appear to extend beyond the specific scope of issue #40's Zod parser rewrite. The docs/developer/schema-system.md file introduces architectural concepts not implemented in the code, including references to new Video helpers and multiple new components, which are not reflected in the parser.rs changes that only implement Image and Reference detection. The test schema files (articles.schema.json and notes.schema.json) and content.config.ts contain extensive additions like new fields (coverAlt, platform, relatedArticles with array references), author references, and multiple constraint updates that go well beyond testing the pattern-matching parser rewrite. These schema enhancements appear to be part of a broader feature effort rather than supporting the core objective of rewriting the parser itself. Consider either scoping this PR to focus exclusively on the Zod parser pattern-matching implementation in parser.rs and its supporting tests, or clarify if the documentation and schema updates are intentionally part of this PR and document their connection to issue #40. If the schema updates and documentation improvements are meant to demonstrate or test the new parser capabilities, consider adding comments explaining their relationship to the pattern-matching rewrite rather than presenting them as standalone enhancements.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "Zod parser pattern matching rewrite" is clear, concise, and directly reflects the primary change in the changeset. It accurately summarizes the core objective to rewrite the Zod parser to use pattern matching instead of line-by-line parsing. The title is specific enough that a teammate reviewing commit history would immediately understand the main purpose of these changes.

[pre_merge_check_pass]

The code changes implement the core requirements from issue #40. The parser.rs file now contains the new pattern-matching approach with HelperType enum (Image, Reference), HelperMatch struct, and the key functions find_helper_calls(), resolve_field_path(), is_inside_array(), and extract_zod_special_fields(). These directly address the two-pass pattern-matching strategy outlined in the issue. The implementation handles the specified edge cases including deeply nested fields, arrays of objects, and multi-line formatting. Documentation updates in schema-system.md and the new task-zod-parser-pattern-matching-rewrite.md file provide context and planning details for the rewrite. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%. |

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch zod-parser-pattern-matching-rewrite

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
docs/developer/schema-system.md (1)

18-18: Minor: Consider adding language identifiers to fenced code blocks.

Static analysis tools suggest adding language identifiers to the fenced code blocks at lines 18 and 400 for better syntax highlighting and parsing. This is a minor documentation improvement.

Example:





Also applies to: 400-400

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: CodeRabbit UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 8378a95cdc00635477884657c958513e3f659997 and b8a4d49bad75ef1257da2eedba6d023e912c03ad.

</details>

<details>
<summary>📒 Files selected for processing (7)</summary>

* `docs/developer/schema-system.md` (1 hunks)
* `docs/tasks-done/task-zod-parser-pattern-matching-rewrite.md` (1 hunks)
* `docs/tasks-todo/task-2-zod-parser-pattern-matching-rewrite.md` (0 hunks)
* `src-tauri/src/parser.rs` (13 hunks)
* `test/dummy-astro-project/.astro/collections/articles.schema.json` (1 hunks)
* `test/dummy-astro-project/.astro/collections/notes.schema.json` (3 hunks)
* `test/dummy-astro-project/src/content.config.ts` (4 hunks)

</details>

<details>
<summary>💤 Files with no reviewable changes (1)</summary>

* docs/tasks-todo/task-2-zod-parser-pattern-matching-rewrite.md

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>📓 Path-based instructions (2)</summary>

<details>
<summary>docs/developer/**</summary>


**📄 CodeRabbit inference engine (CLAUDE.md)**

> Update docs/developer/ guides when introducing new patterns

Files:
- `docs/developer/schema-system.md`

</details>
<details>
<summary>src-tauri/**/*.rs</summary>


**📄 CodeRabbit inference engine (CLAUDE.md)**

> Use modern Rust formatting: format("{variable}")

Files:
- `src-tauri/src/parser.rs`

</details>

</details><details>
<summary>🧬 Code graph analysis (1)</summary>

<details>
<summary>src-tauri/src/parser.rs (3)</summary><blockquote>

<details>
<summary>src-tauri/src/models/collection.rs (1)</summary>

* `new` (24-32)

</details>
<details>
<summary>test/dummy-astro-project/src/content.config.ts (1)</summary>

* `collections` (64-64)

</details>
<details>
<summary>src-tauri/src/test_fixtures/enhanced_config.ts (1)</summary>

* `collections` (69-69)

</details>

</blockquote></details>

</details><details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>docs/tasks-done/task-zod-parser-pattern-matching-rewrite.md</summary>

[grammar] ~260-~260: Use a hyphen to join words.
Context: ...gested Approach  1. **Create new pattern matching functions**:     ```rust    fn ...

(QB_NEW_EN_HYPHEN)

---

[style] ~378-~378: Consider a different adjective to strengthen your wording.
Context: ...ITICAL REVIEW ✅  ### What Changed After Deep Code Review  **Before Review**: Simple ...

(DEEP_PROFOUND)

---

[grammar] ~391-~391: Ensure spelling is correct
Context: ...looks for array field names and changes their `sub_type` - Current parser likely has bugs with arrays of o...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

---

[style] ~883-~883: ‘exact same’ might be wordy. Consider a shorter alternative.
Context: ...ern-matching approach while maintaining exact same output format.  **Strategy**: - Keep al...

(EN_WORDINESS_PREMIUM_EXACT_SAME)

---

[style] ~1264-~1264: Consider using a different verb for a more formal wording.
Context: ...information**  If any checkpoint fails, fix issues before moving to next phase.  --...

(FIX_RESOLVE)

</details>

</details>
<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

<details>
<summary>docs/tasks-done/task-zod-parser-pattern-matching-rewrite.md</summary>

3-3: Bare URL used

(MD034, no-bare-urls)

---

41-41: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

---

46-46: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

---

57-57: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

---

71-71: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

---

104-104: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

436-436: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

---

765-765: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

1107-1107: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>
<details>
<summary>docs/developer/schema-system.md</summary>

18-18: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

400-400: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</details>

<details>
<summary>🔇 Additional comments (19)</summary><blockquote>

<details>
<summary>docs/tasks-done/task-zod-parser-pattern-matching-rewrite.md (2)</summary><blockquote>

`1-373`: **Comprehensive task documentation with excellent detail.**

This documentation provides thorough context about the parser rewrite, including completion status, implementation phases, and debugging guidance. The level of detail is excellent for future maintainers.

A few minor suggestions:
- Consider adding a "Quick Reference" section at the top that links directly to the key implementation functions (find_helper_calls, resolve_field_path, extract_zod_special_fields) since this is now marked complete
- The "PREVIOUS PROGRESS & HANDOFF NOTES" section (lines 37-172) is historical context that's valuable but could be moved to an appendix to make the document more scannable for someone looking at the completed work

---

`1268-1348`: **Critical gotchas section is valuable.**

The "CRITICAL GOTCHAS DISCOVERED" section (lines 1268-1348) provides essential context about design decisions, particularly:
- Arrays of objects with images/references are intentionally NOT supported (line 1271-1274)
- JSON schemas contain ALL constraints (line 1278-1294)
- Field names MUST match JSON schema exactly (line 1297-1303)

This is excellent documentation that will prevent future confusion.

</blockquote></details>
<details>
<summary>src-tauri/src/parser.rs (9)</summary><blockquote>

`5-18`: **Well-designed data structures for pattern matching.**

The `HelperType` enum and `HelperMatch` struct are appropriately simple and focused. Good use of `PartialEq` for `HelperType` to enable comparisons in tests.

---

`374-403`: **Clean helper detection using regex.**

The `find_helper_calls` function uses straightforward regex patterns to find `image()` and `reference()` helpers. The reference regex correctly captures collection names from both single and double quotes.

One note: The regex will match helpers inside string literals (e.g., `description: "Use image() helper"`), but this is documented as an accepted limitation in the task docs and the current parser has the same behavior.

---

`405-439`: **Correct backward scanning for field names.**

The `find_field_name_backwards` function properly scans backwards to find the field name (identifier before `:`). The logic correctly handles whitespace and JavaScript identifier characters (alphanumeric, `_`, `$`).

---

`441-458`: **Array detection logic is sound.**

The `is_inside_array` function correctly identifies when a helper is wrapped in `z.array()` by checking for the pattern between the field colon and the helper position. This handles multi-line formatting well.

---

`460-521`: **Sophisticated path resolution with correct brace tracking.**

The `resolve_field_path` function implements the core algorithm for building dotted field paths (e.g., `coverImage.image`). The brace-level tracking logic is correct:
1. Finds immediate field name
2. Tracks braces while scanning backwards
3. When exiting a brace level (negative count), finds parent field
4. Prevents duplicates and reverses to get correct order

The algorithm correctly handles arbitrary nesting depth and multi-line formatting.

---

`523-614`: **Well-structured main extraction function.**

The `extract_zod_special_fields` function orchestrates the helper detection, path resolution, and JSON generation. Good design decisions:
- Delegates to focused helper functions
- Handles array vs non-array contexts correctly
- Silently skips unresolvable paths (as documented in the design)
- Generates backward-compatible JSON format

The JSON structure correctly distinguishes between array and non-array fields, with appropriate metadata for each helper type.

---

`808-1390`: **Comprehensive test suite with excellent coverage.**

The test suite thoroughly covers:
- Unit tests for helper discovery (basic, multiline, deep nesting)
- Path resolution tests (top-level, nested, deep, multiline, arrays)
- Integration tests (full parsing with image/reference helpers, multi-line formatting, comments)

Tests correctly verify:
- Field names include dotted paths for nested fields (e.g., `coverImage.image`)
- Reference helpers capture collection names
- Array helpers produce correct type structure
- Multi-line formatting works correctly

The integration tests create temporary directories and clean up properly. Good test hygiene.

---

`223-292`: **Correctly handles both collection format styles.**

The `parse_collection_definitions` function properly handles two formats:
1. **New format** (lines 230-264): `export const collections = { articles, notes }` - extracts names from the block, then searches `full_content` for the actual definitions
2. **Old format** (lines 266-289): `collections: { articles: defineCollection(...) }` - definitions are in the `collections_block` itself

The intentional difference in whether `full_content` or `collections_block` is passed to `extract_basic_schema` makes sense: new format needs to search the full file for definitions, while old format has definitions inline.

---

`370-372`: **Clean integration of new pattern-matching approach.**

The updated `parse_schema_fields` function now simply delegates to `extract_zod_special_fields`, maintaining the same interface while using the new pattern-matching implementation. This is a clean integration point that preserves backward compatibility.

</blockquote></details>
<details>
<summary>docs/developer/schema-system.md (5)</summary><blockquote>

`14-50`: **Clear architectural overview.**

The overview section accurately describes the system architecture where JSON schemas are the primary source of truth and the Zod parser enhances them by detecting Astro helpers. The ASCII diagram effectively visualizes the data flow.

---

`53-96`: **Comprehensive JSON schema documentation.**

The JSON schema section accurately documents what Astro generates and provides clear examples. The note about nested objects being flattened with dotted paths (line 66) is particularly important as it aligns with the parser's path resolution logic.

---

`98-158`: **Accurate Zod schema enhancement documentation.**

The section correctly describes the Zod parser's role in detecting `image()` and `reference()` helpers. The examples showing nested fields (like `coverImage.image`) with dotted path notation align perfectly with the implementation in `parser.rs`.

---

`206-340`: **Excellent guide for adding new Astro helpers.**

This section provides a clear, step-by-step pattern for adding support for new Astro helpers (using `video()` as an example). The 5-step process is well-documented with code examples, making it easy for future developers to extend the system.

This aligns with the PR objective to make the codebase "easier to maintain and extend."

---

`372-421`: **Comprehensive implementation reference.**

The implementation reference section provides accurate file locations, key function listings, and a clear data flow diagram. This serves as an excellent quick reference for developers working with the schema system.

</blockquote></details>
<details>
<summary>test/dummy-astro-project/.astro/collections/articles.schema.json (1)</summary><blockquote>

`9-10`: **Test schema updated to include validation constraints.**

The added constraints (`minLength`, `maxLength`) and descriptions align with the PR's goal of demonstrating that JSON schemas contain all constraint information. These changes support testing the new pattern-matching parser's behavior with constrained fields.

The constraints match those in the corresponding `content.config.ts` file, ensuring consistency between the Zod schema and generated JSON schema.




Also applies to: 16-17, 25-27

</blockquote></details>
<details>
<summary>test/dummy-astro-project/.astro/collections/notes.schema.json (1)</summary><blockquote>

`8-12`: **Test schema enhanced with nested field constraints.**

The additions include constraints for nested fields (particularly `coverImage.alt` at line 95), which is important for testing the pattern-matching parser's ability to handle nested structures. The constraints match the Zod schema definitions in `content.config.ts`.

These changes support the PR's key feature: correctly handling nested image fields like `coverImage.image`.




Also applies to: 54-56, 59-63, 95-95

</blockquote></details>
<details>
<summary>test/dummy-astro-project/src/content.config.ts (1)</summary><blockquote>

`1-64`: **Comprehensive test configuration covering all helper scenarios.**

This test configuration excellently covers the key scenarios the pattern-matching parser needs to handle:

1. **Nested image helpers**: `coverImage.image` (line 57) - the main use case for the rewrite
2. **Reference helpers**: `author: reference('authors')` (line 32)
3. **Array of references**: `relatedArticles: z.array(reference('articles'))` (line 33)
4. **Constraints and descriptions**: All fields include proper validation that gets captured in JSON schemas

The comment at lines 5-6 clarifying that the `authors` collection uses `file()` loader is helpful context for why it won't appear in the editor but can still be referenced.

This configuration will thoroughly test the new parser's ability to:
- Detect helpers in nested objects
- Resolve dotted paths correctly
- Handle arrays of references
- Work with multi-line formatting

</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@dannysmith dannysmith merged commit 134af46 into main Oct 24, 2025
8 checks passed
@dannysmith dannysmith deleted the zod-parser-pattern-matching-rewrite branch October 24, 2025 00:57
@coderabbitai coderabbitai bot mentioned this pull request Nov 1, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rewrite Zod Schema Parser to Use Pattern Matching

1 participant