Skip to content

Process instances with start execution listeners are sometimes not visible in the Operate UI and REST API #24744

@ThorbenLindhauer

Description

@ThorbenLindhauer

Describe the bug

If we have a start execution listener on the process level (i.e. defined on the process element, not the BPMN start event), it is possible that the process instance does not show up in the Operate UI and is not returned by the corresponding API endpoints (e.g. process instance query). The reason is a race condition during import between the process instance record and the job record for the execution listener.

Note: This problem does not occur in the new exporter we are planning to deliver with 8.78.8, due to the race condition not existing there.

  • Backport to 8.6
  • Also verify, test, and if necessary fix in new exporter on main branch
    • Note: On main we also need to fix it in the 8.6 importer code

To Reproduce

  1. Deploy a process model that has a start execution listener on the process level
  2. Start a process instance
  3. Let the job for the start listener be imported before the process instance

Current behavior

  • The process instance does not become visible in the Operate UI

Expected behavior

  • The process instance is displayed correctly
  • It has no activity badges yet, because at that stage no activity is running yet

Rootcause

Workaround

Declare the listener on the start event of the process

Proposed Solution:

  • Exclude ListViewZeebeRecordProcessor#updateFlowNodeInstanceFromJob from handling process-instance-level records, so that it only updates true flow node instances. For this, we need to decide if the data that this method adds to the list view entity is important/relevant for process instances or not. If not, we can apply this change
    • Although this problem doesn't apply to the new exporter, to be consistent we should apply the same change there (i.e. not add job details to process instance documents in the list view index)

This will lead to a new (but minor) bug, where the "failed but retries left" filter does not work on the process instances with failed EL jobs on process level, because the field does not get set correctly, but additionally also for 8.7 because the search request of the internal API includes a filter for only flowNodeInstances(aka activities).
A new issue is created for this: #27318.

Other proposed solution:
Fix the job-based update to also handle process instances. This is the higher complexity fix that may have more unforeseen side effects (as now process instances can start appearing before they are properly imported). Due to our timeline and the rather minor impact that the bug introduced through the easier fix will have, we decided to not proceed with this proposition.

Links

### Pull Requests
- [x] main: https://github.com/camunda/camunda/pull/27294
- [x] 8.6: https://github.com/camunda/camunda/pull/27297
- [x] 8.7: https://github.com/camunda/camunda/pull/27299

Metadata

Metadata

Labels

component/operateRelated to the Operate component/teamkind/bugCategorizes an issue or PR as a bugsupportMarks an issue as related to a customer support requestversion:8.6.11

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions