Skip to content

Failed to instantiate workflow on K8s with new operator' jobs #546

@bvandewe

Description

@bvandewe

I tried this:

When upgrading to latest alpha17, the operator fails to start an instance and throws errors about optimistic concurrency, the job doesnt start but the operator still tries to delete it. Please see logs below.

BTW, i did:

  • clean the redis DB,
  • recreated a new service account (was named 'adapter.default', now needed to add 'default.default')
  • added new rule to the ClusterRole operator-role to manage Job' batch
  • created new workflow definition, mapped it to the only available operator (not sure if really required?)
  • tried to start a workflow instance from the UI, nothing happens, check operator logs...

This happened:

 [21:59:06] info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
[21:59:06] info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
[21:59:06] info: Microsoft.Hosting.Lifetime[0]
      Content root path: /app
[21:59:45] fail: Synapse.Operator.Services.WorkflowInstanceController[0]
      An error occurred while handling the creation of workflow instance 'form-status-set-3354c346f663.default': Neuroglia.ProblemDetailsException: [409 - Conflict] Failed to update the resource 'synapse.io/v1/namespaces/default/workflow-instances/form-status-set-3354c346f663/status' due to an optimistic concurrency error: the resource's target version '62581DD3' differs from the actual version 'B87D1FFA'
         at Neuroglia.Data.Infrastructure.ResourceOriented.Services.RedisDatabase.PatchSubResourceAsync(Patch patch, String group, String version, String plural, String name, String subResource, String namespace, String resourceVersion, Boolean dryRun, CancellationToken cancellationToken) in /home/runner/work/framework/framework/src/Neuroglia.Data.Infrastructure.ResourceOriented.Redis/Services/RedisDatabase.cs:line 248
         at Neuroglia.Data.Infrastructure.ResourceOriented.Services.ResourceRepository.PatchSubResourceAsync(Patch patch, String group, String version, String plural, String name, String subResource, String namespace, String resourceVersion, Boolean dryRun, CancellationToken cancellationToken) in /home/runner/work/framework/framework/src/Neuroglia.Data.Infrastructure.ResourceOriented/Services/ResourceRepository.cs:line 318
         at Neuroglia.Data.Infrastructure.ResourceOriented.IResourceRepositoryExtensions.PatchStatusAsync[TResource](IResourceRepository repository, Patch patch, String name, String namespace, String resourceVersion, Boolean dryRun, CancellationToken cancellationToken) in /home/runner/work/framework/framework/src/Neuroglia.Data.Infrastructure.ResourceOriented.Abstractions/Extensions/IResourceRepositoryExtensions.cs:line 265
         at Synapse.Operator.Services.WorkflowInstanceHandler.UpdateWorkflowInstanceStatusAsync(Action`1 statusUpdate, CancellationToken cancellationToken) in /src/src/operator/Synapse.Operator/Services/WorkflowInstanceHandler.cs:line 232
         at Synapse.Operator.Services.WorkflowInstanceHandler.StartProcessAsync(CancellationToken cancellationToken) in /src/src/operator/Synapse.Operator/Services/WorkflowInstanceHandler.cs:line 138
         at Synapse.Operator.Services.WorkflowInstanceHandler.HandleAsync(CancellationToken cancellationToken) in /src/src/operator/Synapse.Operator/Services/WorkflowInstanceHandler.cs:line 123
         at Synapse.Operator.Services.WorkflowInstanceController.OnResourceCreatedAsync(WorkflowInstance workflowInstance, CancellationToken cancellationToken) in /src/src/operator/Synapse.Operator/Services/WorkflowInstanceController.cs:line 223
[22:00:59] fail: Synapse.Runtime.Kubernetes.Services.KubernetesRuntime[0]
      An error occurred while deleting the Kubernetes process with id 'form-status-set-f848363e9b0f.default-613605c320ca.synapse': k8s.Autorest.HttpOperationException: Operation returned an invalid status code 'NotFound', response body {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch \"form-status-set-f848363e9b0f\" not found","reason":"NotFound","details":{"name":"form-status-set-f848363e9b0f","group":"batch","kind":"jobs"},"code":404}

         at k8s.Kubernetes.SendRequestRaw(String requestContent, HttpRequestMessage httpRequest, CancellationToken cancellationToken)
         at k8s.AbstractKubernetes.IBatchV1Operations_DeleteNamespacedJobWithHttpMessagesAsync[T](String name, String namespaceParameter, V1DeleteOptions body, String dryRun, Nullable`1 gracePeriodSeconds, Nullable`1 ignoreStoreReadErrorWithClusterBreakingPotential, Nullable`1 orphanDependents, String propagationPolicy, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
         at k8s.AbstractKubernetes.k8s.IBatchV1Operations.DeleteNamespacedJobWithHttpMessagesAsync(String name, String namespaceParameter, V1DeleteOptions body, String dryRun, Nullable`1 gracePeriodSeconds, Nullable`1 ignoreStoreReadErrorWithClusterBreakingPotential, Nullable`1 orphanDependents, String propagationPolicy, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
         at k8s.BatchV1OperationsExtensions.DeleteNamespacedJobAsync(IBatchV1Operations operations, String name, String namespaceParameter, V1DeleteOptions body, String dryRun, Nullable`1 gracePeriodSeconds, Nullable`1 ignoreStoreReadErrorWithClusterBreakingPotential, Nullable`1 orphanDependents, String propagationPolicy, Nullable`1 pretty, CancellationToken cancellationToken)
         at Synapse.Runtime.Kubernetes.Services.KubernetesRuntime.DeleteProcessAsync(String processId, CancellationToken cancellationToken) in /src/src/runtime/Synapse.Runtime.Kubernetes/Services/KubernetesRuntime.cs:line 177
[22:00:59] warn: Synapse.Operator.Services.WorkflowInstanceController[0]
      Failed to delete process with id 'form-status-set-f848363e9b0f.default-613605c320ca.synapse' for workflow instance 'form-status-set-f848363e9b0f.default'
      k8s.Autorest.HttpOperationException: Operation returned an invalid status code 'NotFound', response body {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch \"form-status-set-f848363e9b0f\" not found","reason":"NotFound","details":{"name":"form-status-set-f848363e9b0f","group":"batch","kind":"jobs"},"code":404}

         at k8s.Kubernetes.SendRequestRaw(String requestContent, HttpRequestMessage httpRequest, CancellationToken cancellationToken)
         at k8s.AbstractKubernetes.IBatchV1Operations_DeleteNamespacedJobWithHttpMessagesAsync[T](String name, String namespaceParameter, V1DeleteOptions body, String dryRun, Nullable`1 gracePeriodSeconds, Nullable`1 ignoreStoreReadErrorWithClusterBreakingPotential, Nullable`1 orphanDependents, String propagationPolicy, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
         at k8s.AbstractKubernetes.k8s.IBatchV1Operations.DeleteNamespacedJobWithHttpMessagesAsync(String name, String namespaceParameter, V1DeleteOptions body, String dryRun, Nullable`1 gracePeriodSeconds, Nullable`1 ignoreStoreReadErrorWithClusterBreakingPotential, Nullable`1 orphanDependents, String propagationPolicy, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
         at k8s.BatchV1OperationsExtensions.DeleteNamespacedJobAsync(IBatchV1Operations operations, String name, String namespaceParameter, V1DeleteOptions body, String dryRun, Nullable`1 gracePeriodSeconds, Nullable`1 ignoreStoreReadErrorWithClusterBreakingPotential, Nullable`1 orphanDependents, String propagationPolicy, Nullable`1 pretty, CancellationToken cancellationToken)
         at Synapse.Runtime.Kubernetes.Services.KubernetesRuntime.DeleteProcessAsync(String processId, CancellationToken cancellationToken) in /src/src/runtime/Synapse.Runtime.Kubernetes/Services/KubernetesRuntime.cs:line 177
         at Synapse.Operator.Services.WorkflowInstanceController.OnResourceDeletedAsync(WorkflowInstance workflowInstance, CancellationToken cancellationToken) in /src/src/operator/Synapse.Operator/Services/WorkflowInstanceController.cs:line 282
[22:00:59] warn: Synapse.Runtime.Kubernetes.Services.KubernetesRuntime[0]
      Failed to gracefully stop process 'form-status-set-3354c346f663.default-565505594e74.synapse': k8s.Autorest.HttpOperationException: Operation returned an invalid status code 'NotFound', response body {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch \"form-status-set-3354c346f663.default-565505594e74\" not found","reason":"NotFound","details":{"name":"form-status-set-3354c346f663.default-565505594e74","group":"batch","kind":"jobs"},"code":404}

         at k8s.Kubernetes.SendRequestRaw(String requestContent, HttpRequestMessage httpRequest, CancellationToken cancellationToken)
         at k8s.AbstractKubernetes.IBatchV1Operations_DeleteNamespacedJobWithHttpMessagesAsync[T](String name, String namespaceParameter, V1DeleteOptions body, String dryRun, Nullable`1 gracePeriodSeconds, Nullable`1 ignoreStoreReadErrorWithClusterBreakingPotential, Nullable`1 orphanDependents, String propagationPolicy, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
         at k8s.AbstractKubernetes.k8s.IBatchV1Operations.DeleteNamespacedJobWithHttpMessagesAsync(String name, String namespaceParameter, V1DeleteOptions body, String dryRun, Nullable`1 gracePeriodSeconds, Nullable`1 ignoreStoreReadErrorWithClusterBreakingPotential, Nullable`1 orphanDependents, String propagationPolicy, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
         at k8s.BatchV1OperationsExtensions.DeleteNamespacedJobAsync(IBatchV1Operations operations, String name, String namespaceParameter, V1DeleteOptions body, String dryRun, Nullable`1 gracePeriodSeconds, Nullable`1 ignoreStoreReadErrorWithClusterBreakingPotential, Nullable`1 orphanDependents, String propagationPolicy, Nullable`1 pretty, CancellationToken cancellationToken)
         at Synapse.Runtime.Kubernetes.Services.KubernetesWorkflowProcess.StopAsync(CancellationToken cancellationToken) in /src/src/runtime/Synapse.Runtime.Kubernetes/Services/KubernetesWorkflowProcess.cs:line 164
         at Synapse.Runtime.Kubernetes.Services.KubernetesRuntime.DeleteProcessAsync(String processId, CancellationToken cancellationToken) in /src/src/runtime/Synapse.Runtime.Kubernetes/Services/KubernetesRuntime.cs:line 170
[22:00:59] fail: Synapse.Runtime.Kubernetes.Services.KubernetesRuntime[0]
      An error occurred while deleting the Kubernetes process with id 'form-status-set-3354c346f663.default-565505594e74.synapse': k8s.Autorest.HttpOperationException: Operation returned an invalid status code 'NotFound', response body {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch \"form-status-set-3354c346f663\" not found","reason":"NotFound","details":{"name":"form-status-set-3354c346f663","group":"batch","kind":"jobs"},"code":404}

         at k8s.Kubernetes.SendRequestRaw(String requestContent, HttpRequestMessage httpRequest, CancellationToken cancellationToken)
         at k8s.AbstractKubernetes.IBatchV1Operations_DeleteNamespacedJobWithHttpMessagesAsync[T](String name, String namespaceParameter, V1DeleteOptions body, String dryRun, Nullable`1 gracePeriodSeconds, Nullable`1 ignoreStoreReadErrorWithClusterBreakingPotential, Nullable`1 orphanDependents, String propagationPolicy, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
         at k8s.AbstractKubernetes.k8s.IBatchV1Operations.DeleteNamespacedJobWithHttpMessagesAsync(String name, String namespaceParameter, V1DeleteOptions body, String dryRun, Nullable`1 gracePeriodSeconds, Nullable`1 ignoreStoreReadErrorWithClusterBreakingPotential, Nullable`1 orphanDependents, String propagationPolicy, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
         at k8s.BatchV1OperationsExtensions.DeleteNamespacedJobAsync(IBatchV1Operations operations, String name, String namespaceParameter, V1DeleteOptions body, String dryRun, Nullable`1 gracePeriodSeconds, Nullable`1 ignoreStoreReadErrorWithClusterBreakingPotential, Nullable`1 orphanDependents, String propagationPolicy, Nullable`1 pretty, CancellationToken cancellationToken)
         at Synapse.Runtime.Kubernetes.Services.KubernetesRuntime.DeleteProcessAsync(String processId, CancellationToken cancellationToken) in /src/src/runtime/Synapse.Runtime.Kubernetes/Services/KubernetesRuntime.cs:line 177
[22:00:59] warn: Synapse.Operator.Services.WorkflowInstanceController[0]
      Failed to delete process with id 'form-status-set-3354c346f663.default-565505594e74.synapse' for workflow instance 'form-status-set-3354c346f663.default'
      k8s.Autorest.HttpOperationException: Operation returned an invalid status code 'NotFound', response body {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch \"form-status-set-3354c346f663\" not found","reason":"NotFound","details":{"name":"form-status-set-3354c346f663","group":"batch","kind":"jobs"},"code":404}

         at k8s.Kubernetes.SendRequestRaw(String requestContent, HttpRequestMessage httpRequest, CancellationToken cancellationToken)
         at k8s.AbstractKubernetes.IBatchV1Operations_DeleteNamespacedJobWithHttpMessagesAsync[T](String name, String namespaceParameter, V1DeleteOptions body, String dryRun, Nullable`1 gracePeriodSeconds, Nullable`1 ignoreStoreReadErrorWithClusterBreakingPotential, Nullable`1 orphanDependents, String propagationPolicy, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
         at k8s.AbstractKubernetes.k8s.IBatchV1Operations.DeleteNamespacedJobWithHttpMessagesAsync(String name, String namespaceParameter, V1DeleteOptions body, String dryRun, Nullable`1 gracePeriodSeconds, Nullable`1 ignoreStoreReadErrorWithClusterBreakingPotential, Nullable`1 orphanDependents, String propagationPolicy, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
         at k8s.BatchV1OperationsExtensions.DeleteNamespacedJobAsync(IBatchV1Operations operations, String name, String namespaceParameter, V1DeleteOptions body, String dryRun, Nullable`1 gracePeriodSeconds, Nullable`1 ignoreStoreReadErrorWithClusterBreakingPotential, Nullable`1 orphanDependents, String propagationPolicy, Nullable`1 pretty, CancellationToken cancellationToken)
         at Synapse.Runtime.Kubernetes.Services.KubernetesRuntime.DeleteProcessAsync(String processId, CancellationToken cancellationToken) in /src/src/runtime/Synapse.Runtime.Kubernetes/Services/KubernetesRuntime.cs:line 177
         at Synapse.Operator.Services.WorkflowInstanceController.OnResourceDeletedAsync(WorkflowInstance workflowInstance, CancellationToken cancellationToken) in /src/src/operator/Synapse.Operator/Services/WorkflowInstanceController.cs:line 282
[22:06:31] fail: Synapse.Operator.Services.WorkflowInstanceController[0]
      An error occurred while handling the creation of workflow instance 'form-status-set-2965fe738f0f.default': Neuroglia.ProblemDetailsException: [409 - Conflict] Failed to update the resource 'synapse.io/v1/namespaces/default/workflow-instances/form-status-set-2965fe738f0f/status' due to an optimistic concurrency error: the resource's target version '22839830' differs from the actual version '5D490D61'
         at Neuroglia.Data.Infrastructure.ResourceOriented.Services.RedisDatabase.PatchSubResourceAsync(Patch patch, String group, String version, String plural, String name, String subResource, String namespace, String resourceVersion, Boolean dryRun, CancellationToken cancellationToken) in /home/runner/work/framework/framework/src/Neuroglia.Data.Infrastructure.ResourceOriented.Redis/Services/RedisDatabase.cs:line 248
         at Neuroglia.Data.Infrastructure.ResourceOriented.Services.ResourceRepository.PatchSubResourceAsync(Patch patch, String group, String version, String plural, String name, String subResource, String namespace, String resourceVersion, Boolean dryRun, CancellationToken cancellationToken) in /home/runner/work/framework/framework/src/Neuroglia.Data.Infrastructure.ResourceOriented/Services/ResourceRepository.cs:line 318
         at Neuroglia.Data.Infrastructure.ResourceOriented.IResourceRepositoryExtensions.PatchStatusAsync[TResource](IResourceRepository repository, Patch patch, String name, String namespace, String resourceVersion, Boolean dryRun, CancellationToken cancellationToken) in /home/runner/work/framework/framework/src/Neuroglia.Data.Infrastructure.ResourceOriented.Abstractions/Extensions/IResourceRepositoryExtensions.cs:line 265
         at Synapse.Operator.Services.WorkflowInstanceHandler.UpdateWorkflowInstanceStatusAsync(Action`1 statusUpdate, CancellationToken cancellationToken) in /src/src/operator/Synapse.Operator/Services/WorkflowInstanceHandler.cs:line 232
         at Synapse.Operator.Services.WorkflowInstanceHandler.StartProcessAsync(CancellationToken cancellationToken) in /src/src/operator/Synapse.Operator/Services/WorkflowInstanceHandler.cs:line 138
         at Synapse.Operator.Services.WorkflowInstanceHandler.HandleAsync(CancellationToken cancellationToken) in /src/src/operator/Synapse.Operator/Services/WorkflowInstanceHandler.cs:line 123
         at Synapse.Operator.Services.WorkflowInstanceController.OnResourceCreatedAsync(WorkflowInstance workflowInstance, CancellationToken cancellationToken) in /src/src/operator/Synapse.Operator/Services/WorkflowInstanceController.cs:line 223

I expected this:

No response

Is there a workaround?

No response

Anything else?

No response

Platform(s)

No response

Community Notes

  • Please vote by adding a 👍 reaction to the issue to help us prioritize.
  • If you are interested to work on this issue, please leave a comment.name: Bug Report 🐞

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions