Enable Spot VM testing for GKE with A3 mega GPUs#4951
Enable Spot VM testing for GKE with A3 mega GPUs#4951simrankaurb merged 1 commit intoGoogleCloudPlatform:developfrom
Conversation
Summary of ChangesHello @simrankaurb, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the GKE A3 mega GPU blueprint by integrating support for Spot Virtual Machines, which can lead to substantial cost reductions for users. To ensure the reliability and proper functioning of this new feature, a dedicated daily automated test has been established. This test will continuously validate the deployment and operational aspects of GKE clusters utilizing A3 mega GPUs on Spot VMs, thereby maintaining the integrity of the offering. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces the capability to use Spot VMs for GKE clusters with A3 mega GPUs and adds a new daily test for this configuration. The changes look good overall, but I've identified a critical issue in the find_available_zone.sh script that will break existing and new tests. Additionally, I have some suggestions to improve the maintainability and clarity of the new test configuration files.
39919b7 to
8c7a2d0
Compare
ec88198 to
5fd8503
Compare
5fd8503 to
313f8f5
Compare
aec7114
into
GoogleCloudPlatform:develop
This PR introduces the capability to use Spot VMs for GKE clusters with A3 mega GPUs and adds a new test to validate this configuration.
The main changes are:
A new Cloud Build configuration(tools/cloud-build/daily-tests/builds/gke-a3-megagpu-onspot.yaml) is added to run a daily test for the Spot VM configuration.
A new test configuration file(tools/cloud-build/daily-tests/tests/gke-a3-megagpu-onspot.yml) enables the Spot VM feature by setting enable_spot_vm: true. It also defines the necessary variables and post-deployment tests to validate the GKE cluster with A3 mega GPUs on Spot VMs.
Submission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.