chore: add cases for LWP env#287
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances the E2E testing framework by adding a Clientset to the Framework struct and implementing a GetPodLogs helper method. It also introduces new test cases for the leaderWorkerPattern, including environment variable injection verification and a distributed CPU torchrun job. Feedback was provided regarding the inefficiency and potential flakiness of installing heavy dependencies like PyTorch via pip during test execution, with a recommendation to use a pre-built container image instead.
f6c7ac9 to
9e3dea0
Compare
There was a problem hiding this comment.
Pull request overview
Adds new end-to-end coverage for LeaderWorkerPattern (LWP) environment variable injection and a functional multi-node torchrun job, plus a framework helper to fetch Pod logs for these assertions.
Changes:
- Add e2e test validating
RBG_LWP_*env vars are injected for leader/worker Pods in default RoleInstanceSet mode. - Add e2e test that runs a simple CPU
torchrundistributed all-reduce using injected LWP env vars. - Extend e2e framework with a
ClientsetandGetPodLogshelper for reading Pod logs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| test/e2e/testcase/v1alpha2/lws.go | Adds two new LWP-focused e2e test cases (env var validation + torchrun functional job). |
| test/e2e/framework/framework.go | Adds client-go Clientset wiring and a helper to stream Pod logs for assertions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: 玖宇 <[email protected]>
9e3dea0 to
a71e6e5
Compare
Ⅰ. Motivation
Ⅱ. Modifications
Ⅲ. Does this pull request fix one issue?
fixes #XXXX
Ⅳ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
Ⅴ. Describe how to verify it
VI. Special notes for reviews
Checklist
make fmt.