Skip to content

feat: add inline JSON Schema validation for DAG params (closes #1182)#1887

Merged
yottahmd merged 3 commits intodagucloud:mainfrom
mbprabhoo:feat/output-schema-validation
Apr 1, 2026
Merged

feat: add inline JSON Schema validation for DAG params (closes #1182)#1887
yottahmd merged 3 commits intodagucloud:mainfrom
mbprabhoo:feat/output-schema-validation

Conversation

@mbprabhoo
Copy link
Copy Markdown
Contributor

@mbprabhoo mbprabhoo commented Mar 31, 2026

Summary

Closes #1182 — adds support for defining JSON Schema inline in the params field of a DAG, enabling typed parameter validation with defaults, required-field enforcement, and type coercion — all without an external schema file.

This is part of the broader params/output validation work tracked in #1252.

What Changed

  • internal/core/spec/dag_params_inline_schema.go — new: detects inline schema format and compiles it into a dagParamPlan with defaults applied at build time
  • internal/core/spec/dag_params.go — adds dagParamKindInlineSchema routing so inline schemas go through the same resolution path as external schemas
  • internal/core/spec/dag_params_schema.go — minor: generalises an error message to cover both inline and external schema cases
  • internal/core/spec/dag_params_inline_schema_test.go — unit tests for inline schema detection, plan building, and error cases
  • internal/intg/params_validation_issue1182_test.go — 5 end-to-end integration tests covering the full lifecycle

How to Use

Write the JSON Schema directly under params in your DAG YAML:

params:
  type: object
  properties:
    batch_size:
      type: integer
      default: 10
    start_date:
      type: string
      format: date
    debug:
      type: boolean
      default: false
  required: ["start_date"]
steps:
  - name: run
    command: echo "batch=$batch_size date=$start_date debug=$debug"
  • Defaults are applied automatically — $batch_size is 10 and $debug is false if not supplied.
  • Required fields (start_date) cause the run to fail before any step executes if omitted.
  • Type mismatches (e.g. batch_size=not-an-int) are rejected at startup.
  • Positional parameters are not supported with schema-backed params.

The inline format is detected automatically when params is a map with a properties key. The existing external-schema format (params.schema: path/to/schema.json) is unchanged and takes precedence.

QA

  • 1472 unit/spec tests pass, 627 integration tests pass, 0 lint issues
  • All 5 issue-1182 integration tests pass end-to-end
  • No regressions against existing params or output-validation tests

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added JSON Schema validation for DAG parameters (supporting inline schema definitions)
    • Added JSON Schema validation for step output values
    • Parameters are validated before DAG execution begins
    • Step outputs are validated after execution completes; validation failures prevent dependent steps from running

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 31, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3ec195fe-6dde-49bc-9125-589600aa5af0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces JSON Schema validation for DAG input parameters and step outputs. It extends the DAG schema to support optional schemas for parameter definitions and step outputs, implements inline schema detection and compilation, and adds runtime validation of JSON output against configured schemas.

Changes

Cohort / File(s) Summary
Configuration
.gitignore, internal/cmn/schema/dag.schema.json
Updated .gitignore to ignore test binary and workspace directory; extended schema to support optional schema field in step.output for JSON Schema (Draft-07) validation of captured outputs.
DAG Parameter Schema Handling
internal/core/spec/dag_params.go, internal/core/spec/dag_params_inline_schema.go, internal/core/spec/dag_params_schema.go
Added dagParamKindInlineSchema parameter kind; implemented inline schema detection (isInlineJSONSchema) distinguishing from external-schema format; introduced buildInlineSchemaParamPlan to resolve schemas, extract properties/ordering, apply defaults, and construct parameter plans; updated error messaging for schema-backed parameters.
Step Output Schema Compilation
internal/core/spec/step.go, internal/core/step.go
Extended outputConfig struct with optional Schema field; implemented buildStepOutputSchema to compile raw schema into resolved jsonschema.Resolved; added OutputSchema field to Step struct for runtime validation; updated parseOutputConfig to accept schema in object-form output configuration.
DAG Parameter Schema Tests
internal/core/spec/dag_params_inline_schema_test.go
Added unit tests for inline schema detection, parameter plan construction, default extraction, runtime validation of required/typed fields, rejection of positional parameters, type enforcement, and handling of additionalProperties: false.
Step Output Schema Tests
internal/core/spec/step_output_schema_test.go
Added unit tests for output configuration parsing and schema compilation, validating both valid and invalid schemas, nil handling, and nested object resolution.
Runtime Output Validation
internal/runtime/node.go
Implemented validateOutput method to validate captured JSON output against step's compiled schema; integrated validation into Node.Execute post-capture, setting error and exit code on validation failure without proceeding to status determination.
Output Validation Unit Tests
internal/runtime/node_output_validation_test.go, internal/runtime/node_output_validation_edge_test.go
Added comprehensive unit tests covering schema validation for primitives (string, number, boolean), arrays with typed items, objects with additionalProperties constraints, whitespace handling, schema build/resolve edge cases, nested structures, enums, patterns, and concurrency/thread-safety scenarios.
Integration Tests
internal/intg/output_validation_e2e_test.go, internal/intg/params_validation_issue1182_test.go, internal/intg/params_validation_issue1252_test.go
Added E2E and integration tests covering successful/failed parameter validation with inline schemas, output validation with boundary conditions and schema violations, mixed DAGs with and without schemas, parameter defaults, external schema loading, rejection of unknown parameters, and blocking of dependent steps on validation failure.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Parser as DAG Parser
    participant ParamVal as Parameter Validator
    participant Executor as Step Executor
    participant OutputVal as Output Validator
    participant Node as Runtime Node

    User->>Parser: Load DAG YAML with params schema
    Parser->>ParamVal: Detect inline schema
    ParamVal->>ParamVal: Resolve schema & extract properties
    ParamVal->>ParamVal: Apply schema defaults
    User->>Executor: Provide --params overrides
    Executor->>ParamVal: Validate params against schema
    alt Params Invalid
        ParamVal-->>Executor: Validation Error
        Executor-->>User: Fail (no attempt created)
    else Params Valid
        Executor->>Node: Execute step
        Node->>Node: Capture output
        alt Output Captured
            Node->>OutputVal: Validate output JSON
            OutputVal->>OutputVal: Parse as JSON
            alt Output Invalid
                OutputVal-->>Node: Validation Error
                Node-->>User: Step Failed
            else Output Valid
                OutputVal-->>Node: Pass
                Node-->>User: Step Succeeded
            end
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR includes substantial changes for step output schema validation and integration tests covering output validation, which extends beyond the input parameter schema requirement stated in issue #1182. Clarify whether output schema validation changes are part of this PR's scope or should be addressed in a separate PR; if out-of-scope, move output validation changes to issue #1252 or another tracking issue.
Docstring Coverage ⚠️ Warning Docstring coverage is 73.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly summarizes the main change: adding inline JSON Schema validation for DAG params, and explicitly references the issue it closes (#1182).
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from issue #1182: inline JSON Schema detection, type validation, default application, required-field enforcement, and proper error handling for invalid inputs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/cmn/schema/dag.schema.json (1)

462-493: ⚠️ Potential issue | 🟠 Major

Teach the DAG schema about inline params schemas too.

While this file now documents step.output.schema, the top-level params contract still only accepts legacy maps or {schema: ...} objects. A DAG that uses the new params: { type: object, properties: ... } form will still fail schema-based validation/autocomplete even though the loader accepts it.

Possible fix
       "oneOf": [
         {
           "type": "string",
           "description": "Space-separated positional parameters accessible as $1, $2, etc."
         },
         {
           "type": "array",
           "items": {
             "oneOf": [
               {
                 "type": "string"
               },
               {
                 "$ref": "#/definitions/dagParamLegacyEntry"
               },
               {
                 "$ref": "#/definitions/dagParamRichEntry"
               }
             ]
           },
           "description": "Ordered parameter entries. Supports legacy strings, legacy KEY: value pairs, and rich parameter definitions declared as objects with a required name field plus at least one additional definition field."
         },
         {
           "$ref": "#/definitions/dagParamLegacyMap"
         },
+        {
+          "$ref": "#/definitions/dagParamInlineSchema"
+        },
         {
           "$ref": "#/definitions/dagParamExternalSchema"
         }
       ],

Add a matching dagParamInlineSchema definition and update the params description so editor/tooling behavior matches the new runtime feature.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/cmn/schema/dag.schema.json` around lines 462 - 493, The top-level
"params" schema lacks support for inline JSON Schema objects, so
editors/validation still reject DAGs using params: { type: "object", properties:
... }; add a new definition named dagParamInlineSchema (mirroring the structure
accepted by step.output.schema inline schemas) and include a $ref to
"#/definitions/dagParamInlineSchema" in the params.oneOf array alongside
existing dagParamLegacyMap and dagParamExternalSchema entries; update the params
description to mention the inline-schema format and ensure dagParamInlineSchema
supports standard JSON Schema fields (type, properties, required, etc.)
consistent with the runtime loader.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/core/spec/dag_params.go`:
- Around line 155-159: The current early return when
isInlineJSONSchema(d.Params) and ctx.opts.Has(BuildFlagSkipSchemaValidation)
drops all inline param metadata by returning &dagParamPlan{kind:
dagParamKindLegacy}; instead, preserve param metadata while skipping validation:
refactor buildInlineSchemaParamPlan (or add a boolean param) into validating and
non-validating paths (e.g., buildInlineSchemaParamPlan(d.Params, skipValidation
bool) or buildInlineSchemaParamPlanNoValidate) and call the non-validating
variant when ctx.opts.Has(BuildFlagSkipSchemaValidation) so defaults, required
flags, and named-only resolution are parsed and returned instead of an empty
legacy plan; replace the legacy-returning branch to invoke that new
non-validating builder.

In `@internal/core/spec/step.go`:
- Around line 880-906: buildStepOutputSchema currently always resolves the
output schema which ignores the build context's SkipSchemaValidation setting;
change the function to accept and use the StepBuildContext (remove the blank
identifier for the first param) and check ctx.SkipSchemaValidation() after
parsing cfg and confirming cfg.Schema != nil; if SkipSchemaValidation() is true,
skip calling schema.Resolve (and return nil, nil) so output.schema
resolution/validation is not performed; reference symbols:
buildStepOutputSchema, StepBuildContext, SkipSchemaValidation, cfg.Schema, and
schema.Resolve.

---

Outside diff comments:
In `@internal/cmn/schema/dag.schema.json`:
- Around line 462-493: The top-level "params" schema lacks support for inline
JSON Schema objects, so editors/validation still reject DAGs using params: {
type: "object", properties: ... }; add a new definition named
dagParamInlineSchema (mirroring the structure accepted by step.output.schema
inline schemas) and include a $ref to "#/definitions/dagParamInlineSchema" in
the params.oneOf array alongside existing dagParamLegacyMap and
dagParamExternalSchema entries; update the params description to mention the
inline-schema format and ensure dagParamInlineSchema supports standard JSON
Schema fields (type, properties, required, etc.) consistent with the runtime
loader.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2493c524-af9b-48e1-9790-61654b74e045

📥 Commits

Reviewing files that changed from the base of the PR and between eaee029 and 00510b6.

📒 Files selected for processing (15)
  • .gitignore
  • internal/cmn/schema/dag.schema.json
  • internal/core/spec/dag_params.go
  • internal/core/spec/dag_params_inline_schema.go
  • internal/core/spec/dag_params_inline_schema_test.go
  • internal/core/spec/dag_params_schema.go
  • internal/core/spec/step.go
  • internal/core/spec/step_output_schema_test.go
  • internal/core/step.go
  • internal/intg/output_validation_e2e_test.go
  • internal/intg/params_validation_issue1182_test.go
  • internal/intg/params_validation_issue1252_test.go
  • internal/runtime/node.go
  • internal/runtime/node_output_validation_edge_test.go
  • internal/runtime/node_output_validation_test.go

Comment on lines +155 to +159
if isInlineJSONSchema(d.Params) {
if ctx.opts.Has(BuildFlagSkipSchemaValidation) {
return &dagParamPlan{kind: dagParamKindLegacy}, nil
}
return buildInlineSchemaParamPlan(d.Params)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Skip-schema mode currently erases inline param metadata.

At Lines 155-158, inline-schema DAGs fall back to an empty legacy plan. That drops inline defaults, required flags, and named-only resolution, so any load path using SkipSchemaValidation() will see the DAG as having no params at all.

Minimal fix
 	if isInlineJSONSchema(d.Params) {
-		if ctx.opts.Has(BuildFlagSkipSchemaValidation) {
-			return &dagParamPlan{kind: dagParamKindLegacy}, nil
-		}
 		return buildInlineSchemaParamPlan(d.Params)
 	}

If you still want a strict “no validation” mode later, split buildInlineSchemaParamPlan into validating and non-validating variants instead of returning an empty plan.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/spec/dag_params.go` around lines 155 - 159, The current early
return when isInlineJSONSchema(d.Params) and
ctx.opts.Has(BuildFlagSkipSchemaValidation) drops all inline param metadata by
returning &dagParamPlan{kind: dagParamKindLegacy}; instead, preserve param
metadata while skipping validation: refactor buildInlineSchemaParamPlan (or add
a boolean param) into validating and non-validating paths (e.g.,
buildInlineSchemaParamPlan(d.Params, skipValidation bool) or
buildInlineSchemaParamPlanNoValidate) and call the non-validating variant when
ctx.opts.Has(BuildFlagSkipSchemaValidation) so defaults, required flags, and
named-only resolution are parsed and returned instead of an empty legacy plan;
replace the legacy-returning branch to invoke that new non-validating builder.

Comment thread internal/core/spec/step.go Outdated
Comment on lines +880 to +906
func buildStepOutputSchema(_ StepBuildContext, s *step) (*jsonschema.Resolved, error) {
cfg, err := parseOutputConfig(s.Output)
if err != nil {
return nil, err
}
if cfg == nil || cfg.Schema == nil {
return nil, nil
}

schemaBytes, err := json.Marshal(cfg.Schema)
if err != nil {
return nil, fmt.Errorf("failed to marshal output schema: %w", err)
}

var schema jsonschema.Schema
if err := json.Unmarshal(schemaBytes, &schema); err != nil {
return nil, fmt.Errorf("failed to parse output schema: %w", err)
}

resolved, err := schema.Resolve(&jsonschema.ResolveOptions{
ValidateDefaults: true,
})
if err != nil {
return nil, fmt.Errorf("failed to resolve output schema: %w", err)
}

return resolved, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

SkipSchemaValidation() is ignored for output.schema.

buildStepOutputSchema always resolves the schema, so DAG loads that intentionally disable schema resolution/validation can still fail here. That is inconsistent with the params path and with the documented behavior of SkipSchemaValidation.

Possible fix
-func buildStepOutputSchema(_ StepBuildContext, s *step) (*jsonschema.Resolved, error) {
+func buildStepOutputSchema(ctx StepBuildContext, s *step) (*jsonschema.Resolved, error) {
+	if ctx.opts.Has(BuildFlagSkipSchemaValidation) {
+		return nil, nil
+	}
+
 	cfg, err := parseOutputConfig(s.Output)
 	if err != nil {
 		return nil, err
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func buildStepOutputSchema(_ StepBuildContext, s *step) (*jsonschema.Resolved, error) {
cfg, err := parseOutputConfig(s.Output)
if err != nil {
return nil, err
}
if cfg == nil || cfg.Schema == nil {
return nil, nil
}
schemaBytes, err := json.Marshal(cfg.Schema)
if err != nil {
return nil, fmt.Errorf("failed to marshal output schema: %w", err)
}
var schema jsonschema.Schema
if err := json.Unmarshal(schemaBytes, &schema); err != nil {
return nil, fmt.Errorf("failed to parse output schema: %w", err)
}
resolved, err := schema.Resolve(&jsonschema.ResolveOptions{
ValidateDefaults: true,
})
if err != nil {
return nil, fmt.Errorf("failed to resolve output schema: %w", err)
}
return resolved, nil
func buildStepOutputSchema(ctx StepBuildContext, s *step) (*jsonschema.Resolved, error) {
if ctx.opts.Has(BuildFlagSkipSchemaValidation) {
return nil, nil
}
cfg, err := parseOutputConfig(s.Output)
if err != nil {
return nil, err
}
if cfg == nil || cfg.Schema == nil {
return nil, nil
}
schemaBytes, err := json.Marshal(cfg.Schema)
if err != nil {
return nil, fmt.Errorf("failed to marshal output schema: %w", err)
}
var schema jsonschema.Schema
if err := json.Unmarshal(schemaBytes, &schema); err != nil {
return nil, fmt.Errorf("failed to parse output schema: %w", err)
}
resolved, err := schema.Resolve(&jsonschema.ResolveOptions{
ValidateDefaults: true,
})
if err != nil {
return nil, fmt.Errorf("failed to resolve output schema: %w", err)
}
return resolved, nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/spec/step.go` around lines 880 - 906, buildStepOutputSchema
currently always resolves the output schema which ignores the build context's
SkipSchemaValidation setting; change the function to accept and use the
StepBuildContext (remove the blank identifier for the first param) and check
ctx.SkipSchemaValidation() after parsing cfg and confirming cfg.Schema != nil;
if SkipSchemaValidation() is true, skip calling schema.Resolve (and return nil,
nil) so output.schema resolution/validation is not performed; reference symbols:
buildStepOutputSchema, StepBuildContext, SkipSchemaValidation, cfg.Schema, and
schema.Resolve.

Support defining JSON Schema directly in the `params` field for typed
parameter validation with defaults, required fields, and type coercion.
Previously only an external schema file (via `params.schema`) was
supported; this adds an inline alternative where the schema is written
directly in the YAML:

  params:
    type: object
    properties:
      batch_size:
        type: integer
        default: 10
      start_date:
        type: string
        format: date
      debug:
        type: boolean
        default: false
    required: ["start_date"]

Parameters are detected as inline JSON Schema when the `params` value
is a map with a `properties` key.  Defaults are applied at build time
and exported as environment variables; required fields and type
mismatches are rejected before the DAG run starts.

Closes dagucloud#1182

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@mbprabhoo mbprabhoo force-pushed the feat/output-schema-validation branch from 00510b6 to 7305454 Compare March 31, 2026 21:49
@yottahmd
Copy link
Copy Markdown
Collaborator

yottahmd commented Apr 1, 2026

LGTM, thank you very much!

@yottahmd yottahmd merged commit b53e219 into dagucloud:main Apr 1, 2026
5 checks passed
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 1, 2026

Codecov Report

❌ Patch coverage is 82.45614% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.13%. Comparing base (2948ad9) to head (9d64456).
⚠️ Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
internal/core/spec/dag_params_inline_schema.go 78.26% 5 Missing and 5 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1887      +/-   ##
==========================================
+ Coverage   69.11%   69.13%   +0.01%     
==========================================
  Files         447      448       +1     
  Lines       55361    55417      +56     
==========================================
+ Hits        38263    38310      +47     
- Misses      13695    13705      +10     
+ Partials     3403     3402       -1     
Files with missing lines Coverage Δ
internal/core/spec/dag_params.go 81.53% <100.00%> (+0.87%) ⬆️
internal/core/spec/dag_params_schema.go 61.59% <100.00%> (+1.44%) ⬆️
internal/core/spec/dag_params_inline_schema.go 78.26% <78.26%> (ø)

... and 9 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eaee029...9d64456. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add JSON Schema–based type for DAG input parameters

2 participants