feat: add inline JSON Schema validation for DAG params (closes #1182)#1887
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR introduces JSON Schema validation for DAG input parameters and step outputs. It extends the DAG schema to support optional schemas for parameter definitions and step outputs, implements inline schema detection and compilation, and adds runtime validation of JSON output against configured schemas. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Parser as DAG Parser
participant ParamVal as Parameter Validator
participant Executor as Step Executor
participant OutputVal as Output Validator
participant Node as Runtime Node
User->>Parser: Load DAG YAML with params schema
Parser->>ParamVal: Detect inline schema
ParamVal->>ParamVal: Resolve schema & extract properties
ParamVal->>ParamVal: Apply schema defaults
User->>Executor: Provide --params overrides
Executor->>ParamVal: Validate params against schema
alt Params Invalid
ParamVal-->>Executor: Validation Error
Executor-->>User: Fail (no attempt created)
else Params Valid
Executor->>Node: Execute step
Node->>Node: Capture output
alt Output Captured
Node->>OutputVal: Validate output JSON
OutputVal->>OutputVal: Parse as JSON
alt Output Invalid
OutputVal-->>Node: Validation Error
Node-->>User: Step Failed
else Output Valid
OutputVal-->>Node: Pass
Node-->>User: Step Succeeded
end
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/cmn/schema/dag.schema.json (1)
462-493:⚠️ Potential issue | 🟠 MajorTeach the DAG schema about inline
paramsschemas too.While this file now documents
step.output.schema, the top-levelparamscontract still only accepts legacy maps or{schema: ...}objects. A DAG that uses the newparams: { type: object, properties: ... }form will still fail schema-based validation/autocomplete even though the loader accepts it.Possible fix
"oneOf": [ { "type": "string", "description": "Space-separated positional parameters accessible as $1, $2, etc." }, { "type": "array", "items": { "oneOf": [ { "type": "string" }, { "$ref": "#/definitions/dagParamLegacyEntry" }, { "$ref": "#/definitions/dagParamRichEntry" } ] }, "description": "Ordered parameter entries. Supports legacy strings, legacy KEY: value pairs, and rich parameter definitions declared as objects with a required name field plus at least one additional definition field." }, { "$ref": "#/definitions/dagParamLegacyMap" }, + { + "$ref": "#/definitions/dagParamInlineSchema" + }, { "$ref": "#/definitions/dagParamExternalSchema" } ],Add a matching
dagParamInlineSchemadefinition and update theparamsdescription so editor/tooling behavior matches the new runtime feature.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/cmn/schema/dag.schema.json` around lines 462 - 493, The top-level "params" schema lacks support for inline JSON Schema objects, so editors/validation still reject DAGs using params: { type: "object", properties: ... }; add a new definition named dagParamInlineSchema (mirroring the structure accepted by step.output.schema inline schemas) and include a $ref to "#/definitions/dagParamInlineSchema" in the params.oneOf array alongside existing dagParamLegacyMap and dagParamExternalSchema entries; update the params description to mention the inline-schema format and ensure dagParamInlineSchema supports standard JSON Schema fields (type, properties, required, etc.) consistent with the runtime loader.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/core/spec/dag_params.go`:
- Around line 155-159: The current early return when
isInlineJSONSchema(d.Params) and ctx.opts.Has(BuildFlagSkipSchemaValidation)
drops all inline param metadata by returning &dagParamPlan{kind:
dagParamKindLegacy}; instead, preserve param metadata while skipping validation:
refactor buildInlineSchemaParamPlan (or add a boolean param) into validating and
non-validating paths (e.g., buildInlineSchemaParamPlan(d.Params, skipValidation
bool) or buildInlineSchemaParamPlanNoValidate) and call the non-validating
variant when ctx.opts.Has(BuildFlagSkipSchemaValidation) so defaults, required
flags, and named-only resolution are parsed and returned instead of an empty
legacy plan; replace the legacy-returning branch to invoke that new
non-validating builder.
In `@internal/core/spec/step.go`:
- Around line 880-906: buildStepOutputSchema currently always resolves the
output schema which ignores the build context's SkipSchemaValidation setting;
change the function to accept and use the StepBuildContext (remove the blank
identifier for the first param) and check ctx.SkipSchemaValidation() after
parsing cfg and confirming cfg.Schema != nil; if SkipSchemaValidation() is true,
skip calling schema.Resolve (and return nil, nil) so output.schema
resolution/validation is not performed; reference symbols:
buildStepOutputSchema, StepBuildContext, SkipSchemaValidation, cfg.Schema, and
schema.Resolve.
---
Outside diff comments:
In `@internal/cmn/schema/dag.schema.json`:
- Around line 462-493: The top-level "params" schema lacks support for inline
JSON Schema objects, so editors/validation still reject DAGs using params: {
type: "object", properties: ... }; add a new definition named
dagParamInlineSchema (mirroring the structure accepted by step.output.schema
inline schemas) and include a $ref to "#/definitions/dagParamInlineSchema" in
the params.oneOf array alongside existing dagParamLegacyMap and
dagParamExternalSchema entries; update the params description to mention the
inline-schema format and ensure dagParamInlineSchema supports standard JSON
Schema fields (type, properties, required, etc.) consistent with the runtime
loader.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2493c524-af9b-48e1-9790-61654b74e045
📒 Files selected for processing (15)
.gitignoreinternal/cmn/schema/dag.schema.jsoninternal/core/spec/dag_params.gointernal/core/spec/dag_params_inline_schema.gointernal/core/spec/dag_params_inline_schema_test.gointernal/core/spec/dag_params_schema.gointernal/core/spec/step.gointernal/core/spec/step_output_schema_test.gointernal/core/step.gointernal/intg/output_validation_e2e_test.gointernal/intg/params_validation_issue1182_test.gointernal/intg/params_validation_issue1252_test.gointernal/runtime/node.gointernal/runtime/node_output_validation_edge_test.gointernal/runtime/node_output_validation_test.go
| if isInlineJSONSchema(d.Params) { | ||
| if ctx.opts.Has(BuildFlagSkipSchemaValidation) { | ||
| return &dagParamPlan{kind: dagParamKindLegacy}, nil | ||
| } | ||
| return buildInlineSchemaParamPlan(d.Params) |
There was a problem hiding this comment.
Skip-schema mode currently erases inline param metadata.
At Lines 155-158, inline-schema DAGs fall back to an empty legacy plan. That drops inline defaults, required flags, and named-only resolution, so any load path using SkipSchemaValidation() will see the DAG as having no params at all.
Minimal fix
if isInlineJSONSchema(d.Params) {
- if ctx.opts.Has(BuildFlagSkipSchemaValidation) {
- return &dagParamPlan{kind: dagParamKindLegacy}, nil
- }
return buildInlineSchemaParamPlan(d.Params)
}If you still want a strict “no validation” mode later, split buildInlineSchemaParamPlan into validating and non-validating variants instead of returning an empty plan.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/core/spec/dag_params.go` around lines 155 - 159, The current early
return when isInlineJSONSchema(d.Params) and
ctx.opts.Has(BuildFlagSkipSchemaValidation) drops all inline param metadata by
returning &dagParamPlan{kind: dagParamKindLegacy}; instead, preserve param
metadata while skipping validation: refactor buildInlineSchemaParamPlan (or add
a boolean param) into validating and non-validating paths (e.g.,
buildInlineSchemaParamPlan(d.Params, skipValidation bool) or
buildInlineSchemaParamPlanNoValidate) and call the non-validating variant when
ctx.opts.Has(BuildFlagSkipSchemaValidation) so defaults, required flags, and
named-only resolution are parsed and returned instead of an empty legacy plan;
replace the legacy-returning branch to invoke that new non-validating builder.
| func buildStepOutputSchema(_ StepBuildContext, s *step) (*jsonschema.Resolved, error) { | ||
| cfg, err := parseOutputConfig(s.Output) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| if cfg == nil || cfg.Schema == nil { | ||
| return nil, nil | ||
| } | ||
|
|
||
| schemaBytes, err := json.Marshal(cfg.Schema) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to marshal output schema: %w", err) | ||
| } | ||
|
|
||
| var schema jsonschema.Schema | ||
| if err := json.Unmarshal(schemaBytes, &schema); err != nil { | ||
| return nil, fmt.Errorf("failed to parse output schema: %w", err) | ||
| } | ||
|
|
||
| resolved, err := schema.Resolve(&jsonschema.ResolveOptions{ | ||
| ValidateDefaults: true, | ||
| }) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to resolve output schema: %w", err) | ||
| } | ||
|
|
||
| return resolved, nil |
There was a problem hiding this comment.
SkipSchemaValidation() is ignored for output.schema.
buildStepOutputSchema always resolves the schema, so DAG loads that intentionally disable schema resolution/validation can still fail here. That is inconsistent with the params path and with the documented behavior of SkipSchemaValidation.
Possible fix
-func buildStepOutputSchema(_ StepBuildContext, s *step) (*jsonschema.Resolved, error) {
+func buildStepOutputSchema(ctx StepBuildContext, s *step) (*jsonschema.Resolved, error) {
+ if ctx.opts.Has(BuildFlagSkipSchemaValidation) {
+ return nil, nil
+ }
+
cfg, err := parseOutputConfig(s.Output)
if err != nil {
return nil, err
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func buildStepOutputSchema(_ StepBuildContext, s *step) (*jsonschema.Resolved, error) { | |
| cfg, err := parseOutputConfig(s.Output) | |
| if err != nil { | |
| return nil, err | |
| } | |
| if cfg == nil || cfg.Schema == nil { | |
| return nil, nil | |
| } | |
| schemaBytes, err := json.Marshal(cfg.Schema) | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to marshal output schema: %w", err) | |
| } | |
| var schema jsonschema.Schema | |
| if err := json.Unmarshal(schemaBytes, &schema); err != nil { | |
| return nil, fmt.Errorf("failed to parse output schema: %w", err) | |
| } | |
| resolved, err := schema.Resolve(&jsonschema.ResolveOptions{ | |
| ValidateDefaults: true, | |
| }) | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to resolve output schema: %w", err) | |
| } | |
| return resolved, nil | |
| func buildStepOutputSchema(ctx StepBuildContext, s *step) (*jsonschema.Resolved, error) { | |
| if ctx.opts.Has(BuildFlagSkipSchemaValidation) { | |
| return nil, nil | |
| } | |
| cfg, err := parseOutputConfig(s.Output) | |
| if err != nil { | |
| return nil, err | |
| } | |
| if cfg == nil || cfg.Schema == nil { | |
| return nil, nil | |
| } | |
| schemaBytes, err := json.Marshal(cfg.Schema) | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to marshal output schema: %w", err) | |
| } | |
| var schema jsonschema.Schema | |
| if err := json.Unmarshal(schemaBytes, &schema); err != nil { | |
| return nil, fmt.Errorf("failed to parse output schema: %w", err) | |
| } | |
| resolved, err := schema.Resolve(&jsonschema.ResolveOptions{ | |
| ValidateDefaults: true, | |
| }) | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to resolve output schema: %w", err) | |
| } | |
| return resolved, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/core/spec/step.go` around lines 880 - 906, buildStepOutputSchema
currently always resolves the output schema which ignores the build context's
SkipSchemaValidation setting; change the function to accept and use the
StepBuildContext (remove the blank identifier for the first param) and check
ctx.SkipSchemaValidation() after parsing cfg and confirming cfg.Schema != nil;
if SkipSchemaValidation() is true, skip calling schema.Resolve (and return nil,
nil) so output.schema resolution/validation is not performed; reference symbols:
buildStepOutputSchema, StepBuildContext, SkipSchemaValidation, cfg.Schema, and
schema.Resolve.
Support defining JSON Schema directly in the `params` field for typed
parameter validation with defaults, required fields, and type coercion.
Previously only an external schema file (via `params.schema`) was
supported; this adds an inline alternative where the schema is written
directly in the YAML:
params:
type: object
properties:
batch_size:
type: integer
default: 10
start_date:
type: string
format: date
debug:
type: boolean
default: false
required: ["start_date"]
Parameters are detected as inline JSON Schema when the `params` value
is a map with a `properties` key. Defaults are applied at build time
and exported as environment variables; required fields and type
mismatches are rejected before the DAG run starts.
Closes dagucloud#1182
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
00510b6 to
7305454
Compare
|
LGTM, thank you very much! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1887 +/- ##
==========================================
+ Coverage 69.11% 69.13% +0.01%
==========================================
Files 447 448 +1
Lines 55361 55417 +56
==========================================
+ Hits 38263 38310 +47
- Misses 13695 13705 +10
+ Partials 3403 3402 -1
... and 9 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Summary
Closes #1182 — adds support for defining JSON Schema inline in the
paramsfield of a DAG, enabling typed parameter validation with defaults, required-field enforcement, and type coercion — all without an external schema file.This is part of the broader params/output validation work tracked in #1252.
What Changed
internal/core/spec/dag_params_inline_schema.go— new: detects inline schema format and compiles it into adagParamPlanwith defaults applied at build timeinternal/core/spec/dag_params.go— addsdagParamKindInlineSchemarouting so inline schemas go through the same resolution path as external schemasinternal/core/spec/dag_params_schema.go— minor: generalises an error message to cover both inline and external schema casesinternal/core/spec/dag_params_inline_schema_test.go— unit tests for inline schema detection, plan building, and error casesinternal/intg/params_validation_issue1182_test.go— 5 end-to-end integration tests covering the full lifecycleHow to Use
Write the JSON Schema directly under
paramsin your DAG YAML:$batch_sizeis10and$debugisfalseif not supplied.start_date) cause the run to fail before any step executes if omitted.batch_size=not-an-int) are rejected at startup.The inline format is detected automatically when
paramsis a map with apropertieskey. The existing external-schema format (params.schema: path/to/schema.json) is unchanged and takes precedence.QA
🤖 Generated with Claude Code
Summary by CodeRabbit