-
Notifications
You must be signed in to change notification settings - Fork 134
Introduce managed SolrCloud update strategy #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c28452f to
12ffef7
Compare
ec88d61 to
eeb8486
Compare
|
LGTM +1 ... ran through various scenarios with a 5-node cluster and multiple collections. Also tried with no collection and a single node cluster just to verify no weird edges cases. |
|
Also tested some edge cases like using an image that doesn't exist, then reverting back. Works even better than the statefulset logic! 😄 |
|
Need any help testing this out? Looks awesome. We would probably start using |
|
It'd be great if y'all could test it out. I think I want to add some stuff in the status about whether a node is up-to-date with the spec. (Right now it only looks at the image tag, not other pod specs). Other than that I think this is close to ready to merge. Obviously every use case we have to test it out makes me even more confident! 😄 |
92a0e79 to
126add9
Compare
|
@sepulworld The status changes are in. Let me know when you get a chance to test this out! |
eda3264 to
38d9213
Compare
Signed-off-by: Houston Putman <[email protected]>
990dac8 to
2803145
Compare
Signed-off-by: Houston Putman <[email protected]>
ddc5b32 to
78857db
Compare
Signed-off-by: Houston Putman <[email protected]>
78857db to
55782e8
Compare
thelabdude
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work as usual Houston! I kicked the tires on this PR in GKE over the past few days and looks great to me
Signed-off-by: Houston Putman <[email protected]>
Signed-off-by: Houston Putman <[email protected]>
Signed-off-by: Houston Putman <[email protected]>
Signed-off-by: Houston Putman <[email protected]>
Signed-off-by: Houston Putman <[email protected]>
- Issue with passing an array to a method and assigning value to the array - Deep dive into Ingress rules instead of an overall DeepEquals Signed-off-by: Houston Putman <[email protected]>
Also increasing the documentation for the change to the ordering of live nodes, as well as the change for non-started, out-of-date pods (which are updated automatically). Signed-off-by: Houston Putman <[email protected]>
Signed-off-by: Houston Putman <[email protected]>
Solr 8.6 introduced a new security feature, where the backup & restore features could not be used with arbitrary paths. Only the SOLR_HOME, the SOLR_DATA_HOME or explicitly defined paths could be provided. Instead of using the "allowedPaths" input to specify additional paths, the solr operator now puts backup & restore data within SOLR_HOME (the data volume). This allows us to support both 8.5- and 8.6+. Additionally the data PVCs will not retain the backup information, because volumes to not persist data that is mounted to other volumes within their directories. Signed-off-by: Houston Putman <[email protected]>
c135ff2 to
95aa678
Compare
Signed-off-by: Houston Putman <[email protected]>
Issue number of the reported bug or feature request: Resolves #66
Describe your changes
There has been much discussion around the way in which rolling upgrades should occur for Solr within Kubernetes.
Currently the operator lets the statefulset controller manage upgrades, in a rolling one-by-one strategy.
In the ticket linked above, there was discussion around the addition of a parameter in the healthcheck handler of solr to make sure that all Solr cores in a pod are active before issuing a restart command. This handler would make the default statefulset rolling upgrade strategy safer for SolrClouds, but it doesn't actually solve the issue of optimal ordering of pods for restarts or batching all pods that can be restarted at the same time.
This PR gives the Solr Operator the control over which pods are deleted and when. The operator sets the StatefulSet upgrade policy to be
OnDeleteand will choose a set of pods to be deleted at any given time when there are pods in a solrcloud statefulset that are not up to date. Currently the ordering for upgrading pods is (from first to upgrade -> last to upgrade)Already offline->No replicas-># of leaders-># of replicas->not live->live->Overseer Node.There should likely be good documentation around this feature so that users are well aware of the expected behavior and how this mechanism works.
The following options have been added to the SolrCloud CRD:
Testing performed
There are unit tests included that try to ensure that the pod selection and ordering logic is correct. As well as the parsing of the CRD options.
I have tested this manually, but haven't exhausted all possible cases.
Before this is merged, this needs likely needs an integration test suite that ensures additional changes will not break this functionality.