-
Notifications
You must be signed in to change notification settings - Fork 641
Closed
Labels
enhancementNew feature or requestNew feature or request
Milestone
Description
Search before asking
- I had searched in the issues and found no similar issues.
Enhancement Request
Description
The current implementation of eventmesh-operator has several critical issues regarding Kubernetes resource management and internal concurrency safety, which may lead to deployment failures
or unstable behavior in production environments.
Issues Identified
-
Missing Headless Service for StatefulSets
- Problem: The operator creates StatefulSet resources for both Runtime and Connectors but fails to create the corresponding Headless Service. It also does not set the serviceName
field in the StatefulSet spec. - Impact: Pods managed by the StatefulSet will not have stable network identities (DNS entries like pod-0.service-name.namespace.svc.cluster.local), which is a core feature of
StatefulSets and essential for cluster communication.
- Problem: The operator creates StatefulSet resources for both Runtime and Connectors but fails to create the corresponding Headless Service. It also does not set the serviceName
-
Unsafe Global State Usage
- Problem: A global variable IsEventMeshRuntimeInitialized in share/share.go is used to track runtime readiness.
- Impact: This design is not thread-safe and breaks in multi-tenant or multi-cluster scenarios (e.g., managing multiple EventMesh clusters in different namespaces). It causes race
conditions and incorrect dependency checks.
-
Hardcoded Replica Logic
- Problem: The RuntimeReconciler hardcodes Replicas to 1 in some paths, potentially ignoring the replicaPerGroup configuration defined in the CRD.
-
Blocking Operations
- Problem: The controller uses time.Sleep() for retries or waiting.
- Impact: This blocks the reconciliation thread, reducing the operator's throughput and responsiveness. It should use reconcile.Result{RequeueAfter: ...} instead.
Describe the solution you'd like
Proposed Fixes
-
Refactor Controllers:
- Implement logic to automatically create a Headless Service (ClusterIP: None) for each StatefulSet.
- Ensure the StatefulSet.Spec.ServiceName matches the created Service.
-
Remove Global State:
- Delete IsEventMeshRuntimeInitialized.
- Update ConnectorsReconciler to dynamically query the Kubernetes API for Runtime resource status to determine readiness.
-
Enhance Robustness:
- Use correct replica values from the CRSpec.
- Replace blocking sleeps with non-blocking requeue mechanisms.
Environment
- EventMesh Version: (Current Master)
- Kubernetes Version: Any
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct *
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request