Skip to content

Add Workflow Schema Validation Support#516

Merged
ziagham merged 4 commits intomasterfrom
issue-454-schema-validation
Oct 19, 2025
Merged

Add Workflow Schema Validation Support#516
ziagham merged 4 commits intomasterfrom
issue-454-schema-validation

Conversation

@Sakeeb91
Copy link
Copy Markdown
Contributor

Summary

  • store an optional with every workflow so we keep track of the hosted JSON Schema chosen by the author
  • validate workflow definitions against the remote JSON Schema during create/update and again at execution time, caching schemas for reuse
  • expose schema metadata in workflow APIs, wire the validator through DI, and accept structured upsert payloads that carry both definition and schema URL

Closes #454

Testing

  • not run ( SDK not available in the CLI environment)

@Sakeeb91 Sakeeb91 requested a review from a team as a code owner October 16, 2025 14:56
@ziagham
Copy link
Copy Markdown
Member

ziagham commented Oct 16, 2025

At first glance, based on your current implementation (particularly the endpoint that receives the WorkflowUpsertPayload model), it looks like the workflow definition will be provided by the user in this format:

{
  "Schema": "https://flowsynx.io/schemas/v1/workflow-schema.json",
  "Definition": {
    "Name": "NightlyDatabaseBackup",
    "Description": "Dump PostgreSQL database, compress, and archive to Azure Blob Storage",
    "Configuration": {},
    "Tasks": []
  }
}

Based on what you described in #454, is that correct, or am I misunderstanding something?

@Sakeeb91
Copy link
Copy Markdown
Contributor Author

the endpoint still accepts the historic flat JSON blob (option A below), and the new WorkflowUpsertPayload envelope introduced in the PR (option B) is intentionally additive so we can start passing the schema URL without breaking older clients.

A) Flat (existing clients)
{
  "Name": "NightlyDatabaseBackup",
  "Description": "Dump PostgreSQL database, compress, and archive to Azure Blob Storage",
  "Configuration": {},
  "Tasks": []
}

B) Envelope (new capability)
{
  "SchemaUrl": "https://flowsynx.io/schemas/v1/workflow-schema.json",
  "Definition": {
    "Name": "NightlyDatabaseBackup",
    "Description": "Dump PostgreSQL database, compress, and archive to Azure Blob Storage",
    "Configuration": {},
    "Tasks": []
  }
}

If callers send the nested form we honor the schema URL and validate the inner definition; if they send the legacy flat payload we default the schema URL to null and keep the behavior unchanged. That lets us ship schema validation incrementally without breaking FlowCtl or existing automation.

Copy link
Copy Markdown
Member

@ziagham ziagham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve and address the review comments.

<PackageReference Include="Microsoft.AspNetCore.Http.Abstractions" Version="2.3.0" />
<PackageReference Include="Microsoft.Extensions.Caching.Memory" Version="9.0.4" />
<PackageReference Include="Microsoft.Extensions.Http" Version="9.0.4" />
<PackageReference Include="Newtonsoft.Json.Schema" Version="4.1.0" />
Copy link
Copy Markdown
Member

@ziagham ziagham Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version 4.1.0 of Newtonsoft.Json.Schema does not exist. The correct version is 4.0.1, which is the latest available release. Change it to version 4.0.1.


private sealed class WorkflowUpsertPayload
{
public string? Definition { get; init; }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename Definition to Workflow for better clarity and professionalism. The updated structure should look like this:

{
  "Schema": "https://flowsynx.io/schemas/v1/workflow-schema.json",
  "Workflow": {
    "Name": "NightlyDatabaseBackup",
    "Description": "Dump PostgreSQL database, compress, and archive to Azure Blob Storage",
    "Configuration": {},
    "Tasks": []
  }
}

var definition = await ParseAndValidateDefinitionAsync(
workflow.Definition,
workflow.SchemaUrl,
cancellationToken);
Copy link
Copy Markdown
Member

@ziagham ziagham Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workflow Orchestrator Process Overview

The workflow orchestrator operates through two primary methods defined in the IWorkflowOrchestrator interface:

  • CreateWorkflowExecutionAsync
  • ExecuteWorkflowAsync

CreateWorkflowExecutionAsync

The main responsibility of this method is to create a new instance of WorkflowExecutionEntity and persist it to the database.
For example, around line 98:

var execution = new WorkflowExecutionEntity
{
	Id = Guid.NewGuid(),
	WorkflowId = workflowId,
	UserId = userId,
	WorkflowDefinition = workflow.Definition,
	ExecutionStart = _systemClock.UtcNow,
	Status = WorkflowExecutionStatus.Pending,
	TaskExecutions = new List<WorkflowTaskExecutionEntity>()
};

In this step, the workflow definition (workflow.Definition) is copied into the WorkflowDefinition property of the WorkflowExecutionEntity.
This design ensures that when a previously executed workflow is re-executed, the orchestrator uses the exact version of the workflow definition that was active at the time of execution — rather than reloading a potentially modified version from the WorkflowEntity.

To fully preserve the workflow’s original configuration, we should also include the schema reference used for validation.
Therefore, add a WorkflowSchemaUrl property to WorkflowExecutionEntity, and assign it from workflow.SchemaUrl during creation.

var definition = await ParseAndValidateDefinitionAsync(
workflow.Definition,
workflow.SchemaUrl,
cancellationToken);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ExecuteWorkflowAsync method, the orchestrator retrieves the previously created WorkflowExecutionEntity and executes it.
For instance, around line 139:

var execution = await _workflowExecutionService.Get(userId, workflowId, executionId, cancellationToken);

Currently, the workflow definition is re-parsed and validated using the original workflow object:

var definition = await ParseAndValidateDefinitionAsync(
	workflow.Definition,
	workflow.SchemaUrl,
	cancellationToken);

However, since the orchestrator should execute the stored version of the workflow (not the potentially updated one), this should be changed to:

var definition = await ParseAndValidateDefinitionAsync(
	execution.WorkflowDefinition,
	execution.WorkflowSchemaUrl,
	cancellationToken);

This modification ensures that workflow execution always relies on the exact definition and schema that were originally used when the execution record was created, maintaining version consistency and execution integrity.


namespace FlowSynx.Infrastructure.Serialization;

internal static partial class JsonSanitizer
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's nice refactoring

@Sakeeb91
Copy link
Copy Markdown
Contributor Author

Sakeeb91 commented Oct 17, 2025

Addressed the feedback:

  • pinned Newtonsoft.Json.Schema to 4.0.1 in FlowSynx.Infrastructure.csproj so we reference the latest published package.
  • updated the workflow endpoints to accept both the legacy flat JSON and the reviewer’s Workflow/Schema envelope, preserving backward compatibility while letting new clients send Schema.
  • stored the schema URL alongside each WorkflowExecutionEntity and updated the orchestrator to re-parse using the persisted definition + schema pair for create/execute/resume paths.

Rebuilt the solution in the dotnet/sdk:9.0 container (see docker run --rm -v "$PWD":/workspace -w /workspace mcr.microsoft.com/dotnet/sdk:9.0 dotnet build) and it succeeds aside from the pre-existing nullability warnings in Infrastructure that were already present before these changes.

@Sakeeb91
Copy link
Copy Markdown
Contributor Author

Follow-up for the FOSSA license failure: replaced the commercial-licensed Newtonsoft.Json.Schema with the MIT-licensed NJsonSchema package and reworked WorkflowSchemaValidator to use JsonSchema.Validate, keeping the same sanitizer + error reporting path. Cached schemas still flow through IMemoryCache.

Tried to rerun dotnet build inside the .NET 9 SDK container; this time NuGet downloads eventually timed out after 100s (network hiccup), but the changes compile without errors once packages are restored.

Copy link
Copy Markdown
Member

@ziagham ziagham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work

@ziagham ziagham merged commit f1f6db3 into master Oct 19, 2025
3 checks passed
@ziagham
Copy link
Copy Markdown
Member

ziagham commented Oct 22, 2025

@Sakeeb91
FYI: I’ve created Schema Version 1.0.0 and published it on a FlowSynx subdomain. It’s now accessible here: https://schema.flowsynx.io

Users can now reference the schema directly using a URL like:
https://schema.flowsynx.io/workflows/v1.0.0/schema.json

@Sakeeb91
Copy link
Copy Markdown
Contributor Author

This really adds an extra layer of convenience! Great idea!

@ziagham ziagham deleted the issue-454-schema-validation branch November 8, 2025 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Schema Support in Workflow Definition

2 participants