feat: add JSON Schema validation for step outputs#1867
feat: add JSON Schema validation for step outputs#1867yottahmd merged 4 commits intodagucloud:mainfrom
Conversation
Add optional `schema` field to step output configuration that validates captured output against a JSON Schema. When defined, the step fails if the output is not valid JSON or does not conform to the schema. This is useful for validating structured output from commands and LLM responses. Closes dagucloud#1252 (output validation part) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR implements JSON Schema-based validation for step outputs and parameters in DAG workflows. It introduces schema definition support in YAML configuration, compiles schemas during spec parsing, stores them in step structures, and enforces validation at runtime when steps execute. Changes
Sequence DiagramsequenceDiagram
participant User as User
participant Parser as YAML Parser
participant Builder as Schema Builder
participant Step as Step Struct
participant Runtime as Node Execute
participant Validator as Output Validator
User->>Parser: Provide DAG YAML with<br/>output.schema
Parser->>Parser: Parse output config
Parser->>Builder: Extract schema definition
Builder->>Builder: Marshal to JSON
Builder->>Builder: Unmarshal to jsonschema.Schema
Builder->>Builder: Resolve with<br/>ValidateDefaults: true
Builder->>Step: Store OutputSchema
Step->>Runtime: Step definition ready
Runtime->>Runtime: Execute step command
Runtime->>Runtime: Capture output
Runtime->>Validator: Call validateOutput
Validator->>Validator: Check OutputSchema exists
Validator->>Validator: Get captured output variable
Validator->>Validator: Parse output as JSON
Validator->>Validator: Call schema.Validate(parsed)
alt Validation Success
Validator-->>Runtime: Return nil
Runtime-->>User: Step succeeds
else Validation Failure
Validator-->>Runtime: Return error
Runtime-->>User: Step fails, block dependents
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
internal/intg/output_validation_e2e_test.go (2)
500-503: Consider a more specific assertion for the dependent step status.The assertion
NotEqual(t, core.NodeSucceeded, depNode.Status)is broad. For clearer test intent, consider asserting the exact expected status (e.g.,NodeNoneorNodeSkippedif that's what blocked steps should have).♻️ Suggested improvement
assert.Contains(t, badNode.Error, "output validation failed") // Dependent step should not have succeeded - assert.NotEqual(t, core.NodeSucceeded, depNode.Status) + // Dependent step should be in "None" status (never started due to failed dependency) + assert.Equal(t, core.NodeNone, depNode.Status)If the expected status differs, adjust accordingly. This makes the test more precise and self-documenting.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/intg/output_validation_e2e_test.go` around lines 500 - 503, The test currently asserts depNode.Status is not core.NodeSucceeded using assert.NotEqual which is vague; replace that with a precise equality check against the intended blocked status (e.g., assert.Equal(t, core.NodeSkipped, depNode.Status) or assert.Equal(t, core.NodeNone, depNode.Status)) so the expectation is explicit — update the assertion referencing depNode.Status and use the correct core.<status> constant that represents a blocked/dependent-step outcome.
57-95: Consider creatingthinside each subtest for better isolation.The subtests share the
thtest helper created at line 59, which prevents them from running in parallel. For consistency with other tests in this file that createthwithin subtests (e.g.,TestNoSchema_NoRegression), consider restructuring:♻️ Optional refactor for parallel subtests
func TestOutputValidation_Success_BoundaryValues(t *testing.T) { t.Parallel() - th := test.Setup(t) // confidence exactly at minimum (0.0) and maximum (1.0) boundaries t.Run("at minimum boundary", func(t *testing.T) { + t.Parallel() + th := test.Setup(t) dag := th.DAG(t, ` steps: - name: boundary-min @@ -76,6 +78,8 @@ steps: }) t.Run("at maximum boundary", func(t *testing.T) { + t.Parallel() + th := test.Setup(t) dag := th.DAG(t, ` steps: - name: boundary-max🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/intg/output_validation_e2e_test.go` around lines 57 - 95, The subtests in TestOutputValidation_Success_BoundaryValues share the test helper variable th created once at the top, preventing proper isolation and parallel execution; move th := test.Setup(t) into each t.Run closure (inside the "at minimum boundary" and "at maximum boundary" functions) and remove the outer th to ensure each subtest has its own test helper instance before calling th.DAG(...) and dag.Agent().RunSuccess(t).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/cmn/schema/dag.schema.json`:
- Around line 1000-1036: The output.schema definition is too restrictive for
Draft-07 JSON Schema; update the "schema" block so that the root schema and
"items" accept either an object or a boolean (to allow true/false schemas),
change the "type" subschema to allow either a string or an array of strings and
include "null" in allowed values (so union types like ["object","null"] are
valid), and loosen any "items" constraint to accept either a single schema or an
array of schemas; target the "schema" property and its children ("type",
"items", and the root schema definition) when making these changes to match
Draft-07 semantics.
In `@internal/intg/params_validation_issue1252_test.go`:
- Around line 167-196: The test constructs a YAML string embedding schemaPath
which contains Windows backslashes, causing YAML escape issues; fix by
converting schemaPath to a YAML-safe form before embedding (e.g., call
filepath.ToSlash(schemaPath) or otherwise replace/backslash-escape backslashes)
and use that safe variable when calling th.CreateDAGFile for the DAG content
(refer to schemaPath and the CreateDAGFile invocation in
params_validation_issue1252_test.go).
In `@internal/runtime/node.go`:
- Around line 263-268: When validation fails in n.validateOutput, preserve any
preexisting command error and only replace the state when there was no prior
failure: check the runtime's current error (n.err or n.Error()) before calling
n.SetError; if an original error exists, log the schema validation failure but
return the original error without overwriting n.err or exitCode; if there was no
prior error, call n.SetError with the schema error and ensure n.exitCode is set
to a non-zero value (e.g., 1) so RetryPolicy.ShouldRetry can observe a failure.
---
Nitpick comments:
In `@internal/intg/output_validation_e2e_test.go`:
- Around line 500-503: The test currently asserts depNode.Status is not
core.NodeSucceeded using assert.NotEqual which is vague; replace that with a
precise equality check against the intended blocked status (e.g.,
assert.Equal(t, core.NodeSkipped, depNode.Status) or assert.Equal(t,
core.NodeNone, depNode.Status)) so the expectation is explicit — update the
assertion referencing depNode.Status and use the correct core.<status> constant
that represents a blocked/dependent-step outcome.
- Around line 57-95: The subtests in TestOutputValidation_Success_BoundaryValues
share the test helper variable th created once at the top, preventing proper
isolation and parallel execution; move th := test.Setup(t) into each t.Run
closure (inside the "at minimum boundary" and "at maximum boundary" functions)
and remove the outer th to ensure each subtest has its own test helper instance
before calling th.DAG(...) and dag.Agent().RunSuccess(t).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 920ece48-72c8-479d-bbe1-16a7b3c947ba
📒 Files selected for processing (9)
internal/cmn/schema/dag.schema.jsoninternal/core/spec/step.gointernal/core/spec/step_output_schema_test.gointernal/core/step.gointernal/intg/output_validation_e2e_test.gointernal/intg/params_validation_issue1252_test.gointernal/runtime/node.gointernal/runtime/node_output_validation_edge_test.gointernal/runtime/node_output_validation_test.go
| "schema": { | ||
| "type": "object", | ||
| "description": "JSON Schema to validate captured output against. Output must be valid JSON matching this schema or the step fails. Supports standard JSON Schema keywords.", | ||
| "properties": { | ||
| "type": { | ||
| "type": "string", | ||
| "enum": ["object", "array", "string", "number", "integer", "boolean"], | ||
| "description": "The expected JSON type of the output." | ||
| }, | ||
| "properties": { | ||
| "type": "object", | ||
| "description": "Property definitions for object-type output.", | ||
| "additionalProperties": true | ||
| }, | ||
| "required": { | ||
| "type": "array", | ||
| "items": { "type": "string" }, | ||
| "description": "Required property names." | ||
| }, | ||
| "items": { | ||
| "description": "Schema for array items.", | ||
| "type": "object", | ||
| "additionalProperties": true | ||
| }, | ||
| "enum": { | ||
| "type": "array", | ||
| "description": "Allowed values." | ||
| }, | ||
| "minimum": { "type": "number" }, | ||
| "maximum": { "type": "number" }, | ||
| "minLength": { "type": "integer" }, | ||
| "maxLength": { "type": "integer" }, | ||
| "pattern": { "type": "string" }, | ||
| "minItems": { "type": "integer" }, | ||
| "maxItems": { "type": "integer" } | ||
| }, | ||
| "additionalProperties": true |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In JSON Schema draft-07, are boolean schemas (true/false) valid, can typebe"null"or an array of types, and canitems be either a schema or an array of schemas?
💡 Result:
In JSON Schema draft-07: - Boolean schemas (true and false) are valid. A schema of true always validates (matches any instance), while false never validates (matches no instance). This is explicitly supported throughout the specification, including keywords like items, properties, allOf, anyOf, not, etc., which accept "a valid JSON Schema" as their value, and the meta-schema allows "type": ["object", "boolean"]. - The type keyword can be the string "null" or an array of types (e.g., ["null", "string"]). Allowed type values are: "null", "boolean", "object", "array", "number", "string", "integer". The value must be a string or a non-empty array of unique strings from these. - The items keyword can be either a single schema or an array of schemas. The specification states: "The value of 'items' MUST be either a valid JSON Schema or an array of valid JSON Schemas."
Citations:
- 1: https://json-schema.org/draft-07/draft-handrews-json-schema-validation-01
- 2: https://www.learnjsonschema.com/draft7/validation/items/
- 3: https://www.learnjsonschema.com/draft7/validation/type/
- 4: https://json-schema.org/draft-07/json-schema-validation.html
output.schema is incompatible with standard JSON Schema Draft-07.
The authoring schema definition blocks valid Draft-07 constructs before runtime sees them:
typeis restricted to a single string from a hardcoded enum, rejecting"null"and union types like["object", "null"]- Root schema and
itemsare restricted to objects only, rejecting boolean schemas (true/false) and tuple validation (array of schemas)
JSON Schema Draft-07 explicitly allows boolean schemas, type as a string or array of strings (including "null"), and items as either a single schema or an array of schemas. If this feature is meant to accept arbitrary JSON Schema, the field definitions must be loosened to match Draft-07 semantics or the documentation must clarify the intentional restrictions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/cmn/schema/dag.schema.json` around lines 1000 - 1036, The
output.schema definition is too restrictive for Draft-07 JSON Schema; update the
"schema" block so that the root schema and "items" accept either an object or a
boolean (to allow true/false schemas), change the "type" subschema to allow
either a string or an array of strings and include "null" in allowed values (so
union types like ["object","null"] are valid), and loosen any "items" constraint
to accept either a single schema or an array of schemas; target the "schema"
property and its children ("type", "items", and the root schema definition) when
making these changes to match Draft-07 semantics.
| schemaDir := t.TempDir() | ||
| schemaPath := filepath.Join(schemaDir, "test-schema.json") | ||
| require.NoError(t, os.WriteFile(schemaPath, []byte(`{ | ||
| "type": "object", | ||
| "properties": { | ||
| "ENVIRONMENT": { | ||
| "type": "string", | ||
| "enum": ["dev", "staging", "prod"] | ||
| }, | ||
| "REPLICAS": { | ||
| "type": "integer", | ||
| "minimum": 1, | ||
| "maximum": 10, | ||
| "default": 3 | ||
| } | ||
| }, | ||
| "required": ["ENVIRONMENT"] | ||
| }`), 0o600)) | ||
|
|
||
| dagFile := th.CreateDAGFile(t, "issue1252-external.yaml", ` | ||
| name: issue1252-external | ||
| params: | ||
| schema: "`+schemaPath+`" | ||
| values: | ||
| ENVIRONMENT: staging | ||
| steps: | ||
| - name: show-env | ||
| command: echo "env=$ENVIRONMENT replicas=$REPLICAS" | ||
| output: VALUES | ||
| `) |
There was a problem hiding this comment.
Use a YAML-safe path for the external schema reference.
filepath.Join returns backslashes on Windows, and inside this double-quoted YAML string those are treated as escapes. A path like C:\tmp\test-schema.json will be parsed incorrectly, so this test becomes platform-dependent.
💡 Suggested fix
schemaDir := t.TempDir()
schemaPath := filepath.Join(schemaDir, "test-schema.json")
+ schemaPathForYAML := filepath.ToSlash(schemaPath)
require.NoError(t, os.WriteFile(schemaPath, []byte(`{
"type": "object",
"properties": {
@@
dagFile := th.CreateDAGFile(t, "issue1252-external.yaml", `
name: issue1252-external
params:
- schema: "`+schemaPath+`"
+ schema: "`+schemaPathForYAML+`"
values:
ENVIRONMENT: staging
steps:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| schemaDir := t.TempDir() | |
| schemaPath := filepath.Join(schemaDir, "test-schema.json") | |
| require.NoError(t, os.WriteFile(schemaPath, []byte(`{ | |
| "type": "object", | |
| "properties": { | |
| "ENVIRONMENT": { | |
| "type": "string", | |
| "enum": ["dev", "staging", "prod"] | |
| }, | |
| "REPLICAS": { | |
| "type": "integer", | |
| "minimum": 1, | |
| "maximum": 10, | |
| "default": 3 | |
| } | |
| }, | |
| "required": ["ENVIRONMENT"] | |
| }`), 0o600)) | |
| dagFile := th.CreateDAGFile(t, "issue1252-external.yaml", ` | |
| name: issue1252-external | |
| params: | |
| schema: "`+schemaPath+`" | |
| values: | |
| ENVIRONMENT: staging | |
| steps: | |
| - name: show-env | |
| command: echo "env=$ENVIRONMENT replicas=$REPLICAS" | |
| output: VALUES | |
| `) | |
| schemaDir := t.TempDir() | |
| schemaPath := filepath.Join(schemaDir, "test-schema.json") | |
| schemaPathForYAML := filepath.ToSlash(schemaPath) | |
| require.NoError(t, os.WriteFile(schemaPath, []byte(`{ | |
| "type": "object", | |
| "properties": { | |
| "ENVIRONMENT": { | |
| "type": "string", | |
| "enum": ["dev", "staging", "prod"] | |
| }, | |
| "REPLICAS": { | |
| "type": "integer", | |
| "minimum": 1, | |
| "maximum": 10, | |
| "default": 3 | |
| } | |
| }, | |
| "required": ["ENVIRONMENT"] | |
| }`), 0o600)) | |
| dagFile := th.CreateDAGFile(t, "issue1252-external.yaml", ` | |
| name: issue1252-external | |
| params: | |
| schema: "`+schemaPathForYAML+`" | |
| values: | |
| ENVIRONMENT: staging | |
| steps: | |
| - name: show-env | |
| command: echo "env=$ENVIRONMENT replicas=$REPLICAS" | |
| output: VALUES | |
| `) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/intg/params_validation_issue1252_test.go` around lines 167 - 196,
The test constructs a YAML string embedding schemaPath which contains Windows
backslashes, causing YAML escape issues; fix by converting schemaPath to a
YAML-safe form before embedding (e.g., call filepath.ToSlash(schemaPath) or
otherwise replace/backslash-escape backslashes) and use that safe variable when
calling th.CreateDAGFile for the DAG content (refer to schemaPath and the
CreateDAGFile invocation in params_validation_issue1252_test.go).
| // Validate captured output against schema if defined | ||
| if err := n.validateOutput(ctx); err != nil { | ||
| n.SetError(err) | ||
| logger.Error(ctx, "Output schema validation failed", tag.Error(err)) | ||
| return err | ||
| } |
There was a problem hiding this comment.
Preserve the original failure state when schema validation runs.
If the command already failed, this overwrites the root-cause error with a secondary schema error. If the command succeeded, this returns a failure while leaving exitCode at 0, so RetryPolicy.ShouldRetry will never retry schema-validation failures.
🔧 Suggested fix
- // Validate captured output against schema if defined
- if err := n.validateOutput(ctx); err != nil {
- n.SetError(err)
- logger.Error(ctx, "Output schema validation failed", tag.Error(err))
- return err
- }
+ // Preserve the executor error as the root cause. Only treat output-schema
+ // validation as a failure when the command itself succeeded.
+ if n.Error() == nil {
+ if err := n.validateOutput(ctx); err != nil {
+ n.SetExitCode(1)
+ n.SetStatus(core.NodeFailed)
+ n.SetError(err)
+ logger.Error(ctx, "Output schema validation failed", tag.Error(err))
+ return err
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/runtime/node.go` around lines 263 - 268, When validation fails in
n.validateOutput, preserve any preexisting command error and only replace the
state when there was no prior failure: check the runtime's current error (n.err
or n.Error()) before calling n.SetError; if an original error exists, log the
schema validation failure but return the original error without overwriting
n.err or exitCode; if there was no prior error, call n.SetError with the schema
error and ensure n.exitCode is set to a non-zero value (e.g., 1) so
RetryPolicy.ShouldRetry can observe a failure.
- Preserve original command error on schema validation; set exit code to 1 so retry policy can observe the failure - Loosen dag.schema.json to accept full Draft-07 JSON Schema (boolean schemas, null type, union types) - Use filepath.ToSlash for Windows-safe YAML paths in tests Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
yottahmd
left a comment
There was a problem hiding this comment.
LGTM 🚀 Thank you very much for implementing the output schema!
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1867 +/- ##
==========================================
- Coverage 69.07% 69.05% -0.02%
==========================================
Files 445 445
Lines 54771 54816 +45
==========================================
+ Hits 37832 37855 +23
- Misses 13580 13594 +14
- Partials 3359 3367 +8
... and 16 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Summary
schemafield to step output configuration for JSON Schema validationCloses #1252 (output validation part)
Changes
internal/core/step.go— AddedOutputSchemafield toStepstructinternal/core/spec/step.go— Parse and compileschemafrom output config usingjsonschema-gointernal/runtime/node.go— Validate captured output against schema after executioninternal/cmn/schema/dag.schema.json— Addedschemaproperty to output definitionExample
Test plan
step_output_schema_test.go)node_output_validation_test.go)node_output_validation_edge_test.go)output_validation_e2e_test.go,params_validation_issue1252_test.go)🤖 Generated with Claude Code
Summary by CodeRabbit