helm: kvstoremesh with external etcd #34174
Conversation
|
Commit 4081260 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
|
Thanks for the PR! I'll defer the initial review to the @cilium/sig-clustermesh team, since I lack the necessary domain knowledge to judge correctness. Please also remove the merge commit from your branch, as we don't use merge commits in the Cilium project. Please use |
4081260 to
e1b150b
Compare
|
Thanks @calcium6 for PR I just have one question to make sure I understand it.
AFAIK kvstoremsh does not work with external kvstore right now / it's not officially supported. |
|
hey @marseel.. yes the ultimate goal is to propagate data across external kvstores ;) this is just one of the pieces, do you want us to send it as one PR? (with docs, tests etc..) :) |
nah, it's fine to do it in smaller parts. I was just lacking context. cc @hemanthmalla I know that you were looking into this as well. In this case, wouldn't it be better to have a single helm flag, something like |
|
yeah, let's do that.. @calcium6 please let's fix this |
|
@marseel Sorry, missed the first ping. Yes, we already run We brought this up in one of the community meetings as well, but we ended up deciding maybe we shouldn't document too many variations of clustermesh that might confuse users. But with enough users interested in this, maybe the timing is right :) |
|
Generally speaking, it makes sense to me to have the possibility of using kvstoremesh in combination with an external kvstore. I wonder whether an equally valid use case could be kvstoremesh+sidecar etcd, when Cilium operates in kvstore mode (i.e., just omitting the apiserver container, which is not necessary in that case). As mentioned by Marcel, the clustermesh conformance/upgrade tests would need to be extended to cover this configuration as well (that should be pretty trivial, as we are already testing cilium in kvstore mode in a few matrix entries there). Additionally, there's at least one known issue due to the fact that kvstoremesh does not update the Cilium's heartbeat key in etcd, which is used by the status checks performed by Cilium agents. Although this is not a problem when reusing the same etcd cluster of Cilium (as the hearbeat is updated by the operator), it would not work if using a dedicated etcd cluster for kvstoremesh. The fix may be as trivial as registering the Moreover, please keep in mind that kvstoremesh does not currently perform an explicit reconciliation on restart, as it works under the assumption that the etcd instance is ephemeral and recreated from scratch every time. This means that, when using a persistent etcd cluster, possible cached entries disappearing while kvstoremesh is down will not be deleted from the cache until the corresponding lease expires (15 minutes by default, but configurable).
I'm also personally in favor of an explicit flag, to prevent configuring unsupported/invalid combinations. If we decide that kvstoremesh+sidecar etcd is a valid use case, then we may have a |
There was a problem hiding this comment.
I'd suggest to add a note to the documentation on how to deploy kvstoremesh in this context, rather than adding an example here. Examples tend to be easily missed and get stale over time, as well as require a-priory knowledge about the specific goal, as lacking the accompanying description.
e1b150b to
21700f1
Compare
|
Commit 9d20080 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
9d20080 to
7d4d29e
Compare
|
Commit 7d4d29e does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
7d4d29e to
1f86f38
Compare
have the option to use external etcd with kvstoremesh Signed-off-by: Jiri Vanura <[email protected]>
1f86f38 to
a975482
Compare
| cilium install --set cluster.name=$CLUSTER1 --set cluster.id=1 --context $CLUSTER1 | ||
| cilium install --set cluster.name=$CLUSTER2 --set cluster.id=2 --context $CLUSTER2 | ||
|
|
||
| .. note:: |
There was a problem hiding this comment.
This note in "Specify the Cluster Name and ID" section seems out of place. How about moving it to "Enable Cluster Mesh" section?
There was a problem hiding this comment.
I believe this comment has already been resolved by the original submitter:
giorio94
left a comment
There was a problem hiding this comment.
Thanks for the changes.
I've left a few additional comments inline. Overall, the main concern I have about the current proposal is that it would work in a very specific use-case only (and with specific settings, which are also not documented), while leading to a most likely broken configuration otherwise. I personally think that it should be generalized a bit so that it works out of the box in most common scenarios, as well as the validation should be extended to provide safe-guards and prevent users from enabling unsupported configurations. My comments mostly go in that direction. WDYT?
| # -- Use external KVStore for KVStoreMesh. | ||
| kvstoremeshExternalKVStore: false |
There was a problem hiding this comment.
The dedicated clustermesh.kvstoremesh section seems a better fit to me. Additionally, I'd clarify in the comment that the target etcd instance is the one configured via etcd.endpoints, as not obvious otherwise.
Do you have any opinions on:
If we decide that kvstoremesh+sidecar etcd is a valid use case, then we may have a mode flag that allows selecting among the three possible alternatives.
Mostly to decide whether to generalize the setting to potentially enable this combination in the future (it doesn't need to be as part of this PR, considering that it requires a bit of extra work). Still, I'd rather prefer to pick a future-proof setting from the beginning if possible, to avoid having to deal with backward-compatible changes.
There was a problem hiding this comment.
Yes, I also think it would be better to add a mode flag to select from the three options, I'll do that.
There was a problem hiding this comment.
I believe this comment has already been resolved by the original submitter:
| {{/* validate clustermesh-apiserver */}} | ||
| {{- if .Values.clustermesh.useAPIServer }} | ||
| {{- if and (ne .Values.identityAllocationMode "crd") (ne .Values.identityAllocationMode "doublewrite-readkvstore") (ne .Values.identityAllocationMode "doublewrite-readcrd") }} | ||
| {{- if and (ne .Values.identityAllocationMode "crd") (ne .Values.identityAllocationMode "doublewrite-readkvstore") (ne .Values.identityAllocationMode "doublewrite-readcrd") (not .Values.clustermesh.kvstoremeshExternalKVStore) }} |
There was a problem hiding this comment.
I'd suggest adding a similar validation to ensure that the external kvstore mode can only be enabled if etcd.enabled is set to true, and etcd.endpoints is not empty, to avoid surprises (given the current implementation). Similarly, I'd suggest validating that kvstoremesh.enabled=true when useAPIServer=true and kvstoremeshExternalKVStore=true, otherwise the resulting deployment is invalid. Finally, it should be forbidden to enable externalWorkloads.enabled=true if kvstoremeshExternalKVStore=true.
There was a problem hiding this comment.
I believe this comment has already been resolved: https://github.com/cilium/cilium/pull/34174/files#diff-63d6e1fdb929a5058c43790c57f422863b269e45c2be10eb6564ca51fe1a0f4eR171
There was a problem hiding this comment.
The creation of the clustermesh-apiserver specific ClusterRole and ClusterRoleBinding should be disabled when running only KVStoreMesh, as not required. Similarly, the creation of the clustermesh-apiserver Service should be disabled, and the clustermesh-apiserver-metrics Service/ServiceMonitor should be adapted to disable all but KVStoreMesh-related entries. Finally, the creation of the clustermesh-remote-users ConfigMap and of the different TLS-related secrets should be most likely disabled as well.
There was a problem hiding this comment.
I believe this coment has already been resolved by the original submiter.
| {{- end }} | ||
| {{- end }} | ||
| containers: | ||
| {{- if not .Values.clustermesh.kvstoremeshExternalKVStore}} |
There was a problem hiding this comment.
There are a bunch of volumes configured for this deployment that should be skipped when running in KVStoreMesh-only mode, as not mounted in that case.
There was a problem hiding this comment.
I believe this comment has already beed resolved:
- https://github.com/cilium/cilium/pull/34174/files#diff-c85fb06a8dab414080c30e0cf3361b9baade1b357dd2d90e1a1797be2dbdcf31R358
- https://github.com/cilium/cilium/pull/34174/files#diff-c85fb06a8dab414080c30e0cf3361b9baade1b357dd2d90e1a1797be2dbdcf31R336
| data: | ||
| {{- $kvstoremesh := and .Values.clustermesh.useAPIServer .Values.clustermesh.apiserver.kvstoremesh.enabled }} | ||
| {{- $override := ternary (printf "https://clustermesh-apiserver.%s.svc:2379" .Release.Namespace) "" $kvstoremesh }} | ||
| {{- $override := ternary (join "\n- " $.Values.etcd.endpoints) (ternary (printf "https://clustermesh-apiserver.%s.svc:2379" .Release.Namespace) "" $kvstoremesh) .Values.clustermesh.kvstoremeshExternalKVStore }} |
There was a problem hiding this comment.
When external KVStore mode is enabled, the Cilium agents appear to default using the etcd endpoints configured via .Values.etcd.endpoints. However, this change does not seem to account for the handling of the TLS certificates, accounting for .Values.etcd.ssl. Similarly, neither the endpoints nor the certificates seem to be automatically configured in the kvstoremesh container, which would not work out of the box.
| {{- if and (ne .Values.identityAllocationMode "crd") (ne .Values.identityAllocationMode "doublewrite-readkvstore") (ne .Values.identityAllocationMode "doublewrite-readcrd") (not .Values.clustermesh.kvstoremeshExternalKVStore) }} | ||
| {{ fail (printf "The clustermesh-apiserver cannot be enabled in combination with .Values.identityAllocationMode=%s. To establish a Cluster Mesh, directly configure the parameters to access the remote kvstore through .Values.clustermesh.config" .Values.identityAllocationMode ) }} | ||
| {{- end }} | ||
| {{- if .Values.disableEndpointCRD }} |
There was a problem hiding this comment.
The disableEndpointCRD validation can be skipped if kvstoremeshExternalKVStore=true.
There was a problem hiding this comment.
I believe this comment has already been resolved: https://github.com/cilium/cilium/pull/34174/files#diff-63d6e1fdb929a5058c43790c57f422863b269e45c2be10eb6564ca51fe1a0f4eR155
giorio94
left a comment
There was a problem hiding this comment.
(Requesting changes as per above comments)
changed the flag name clustermesh.kvstoremeshExternalKVStore to clustermesh.apiserver.kvstoremesh.useExternalKVStore Signed-off-by: Jiri Vanura <[email protected]>
3430b21 to
6c796c7
Compare
changed the flag clustermesh.apiserver.kvstoremesh.useExternalKVStore to clustermesh.apiserver.kvstoremesh.kvstoreMode Signed-off-by: Jiri Vanura <[email protected]>
|
This pull request has been automatically marked as stale because it |
|
This pull request has not seen any activity since it was marked stale. |
|
I'd like to revive this PR. I checked all the PR comments and think that most of them were addressed by the original submitter. The only exception I see is the TLS certificate comment, I am going to address it myself. Should I post a new PR with my new commit(s)? |
Hey @balous, yes, I guess that's the easiest way given that this one got closed due to inactivity. |
Here it is: #36216 I believe all comments from this PR were addressed (especially the TLS config one) additional fix in |
When running clustermesh-apiserver, there should be an option to enable and disable containers that are not needed.
For example the etcd and apiserver containers are not necessary when running kvstoremesh with an external kvstore.
This commit adds the options to disable these containers.