feat(ci): migrate Linux CI jobs to self-hosted runners on hanzo-k8s#31
Merged
feat(ci): migrate Linux CI jobs to self-hosted runners on hanzo-k8s#31
Conversation
) Route all Linux CI jobs to self-hosted runners (hanzo-k8s label) on our DOKS cluster instead of GitHub-hosted ubuntu-latest. This eliminates queue wait times and runs jobs on dedicated infrastructure. Migrated workflows: ci.yml (8 jobs), workflow-sanity.yml, install-smoke.yml, formal-conformance.yml, labeler.yml, auto-response.yml, stale.yml. Kept on GitHub runners: Android (needs JDK/Gradle/SDK), Windows (needs Windows OS), macOS (needs Xcode/Swift), Docker release (needs Docker), npm release (needs registry). Adjusted test parallelism: BOT_TEST_WORKERS=1 for 4GiB runner pods. Runners: 3x github-runner + 2x github-runner-build pods in actions-runner-system namespace, org-scoped for hanzoai. Co-Authored-By: Claude Opus 4.6 <[email protected]>
The myoung34/github-runner:ubuntu-jammy image only has python3, not the python symlink. Update the no-tabs check accordingly. Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Scale self-hosted runners from 4GiB to 8GiB (node test OOM'd) - Restore BOT_TEST_WORKERS=2 (fits in 8GiB) - Revert labeler.yml to ubuntu-latest (GH_APP_PRIVATE_KEY org secret not available to self-hosted runners) Co-Authored-By: Claude Opus 4.6 <[email protected]>
On self-hosted runners the workspace is /tmp/runner/work/... which falls under os.tmpdir() (/tmp) — an allowed media root. The test was using process.cwd()/package.json which passed the root check, letting the Discord token validation fire first with a different error. Use /usr/share/ instead, which is never under an allowed root. Co-Authored-By: Claude Opus 4.6 <[email protected]>
84502ab to
791367c
Compare
…runners 2 workers × 3072 MB = 6 GB V8 heap on 8 GB runners leaves no room for OS + Node RSS overhead, causing OOM kills during test cleanup. Reducing to 2048 MB per worker (4 GB total) leaves enough headroom. Co-Authored-By: Claude Opus 4.6 <[email protected]>
d0c94c6 to
3ae601e
Compare
test-parallel.mjs runs 3 vitest groups in parallel, each spawning BOT_TEST_WORKERS processes. With 2 workers per group, total V8 heap was 3×2×3072MB = 18GB, causing OOMKilled on 16GB runner nodes. Reducing to 1 worker per group gives 3×3072MB = 9GB — fits in 12GB. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Replace manual myoung34/github-runner deployments with GitHub's official ARC controller. Runner pods now scale on-demand (0→10) when jobs queue, each getting its own 14Gi pod on the runner-pool node pool. Workers stay at 2 per vitest group with 2048MB heap (3×2×2048 = 12GB, fits in 14Gi). Co-Authored-By: Claude Opus 4.6 <[email protected]>
zooqueen
added a commit
that referenced
this pull request
Mar 6, 2026
* feat(ci): migrate Linux CI jobs to self-hosted runners on hanzo-k8s (#31) Route all Linux CI jobs to self-hosted runners (hanzo-k8s label) on our DOKS cluster instead of GitHub-hosted ubuntu-latest. This eliminates queue wait times and runs jobs on dedicated infrastructure. Migrated workflows: ci.yml (8 jobs), workflow-sanity.yml, install-smoke.yml, formal-conformance.yml, labeler.yml, auto-response.yml, stale.yml. Kept on GitHub runners: Android (needs JDK/Gradle/SDK), Windows (needs Windows OS), macOS (needs Xcode/Swift), Docker release (needs Docker), npm release (needs registry). Adjusted test parallelism: BOT_TEST_WORKERS=1 for 4GiB runner pods. Runners: 3x github-runner + 2x github-runner-build pods in actions-runner-system namespace, org-scoped for hanzoai. * fix(ci): use python3 instead of python for self-hosted runner compat The myoung34/github-runner:ubuntu-jammy image only has python3, not the python symlink. Update the no-tabs check accordingly. * fix(ci): bump runner memory to 8GiB and revert labeler to ubuntu-latest - Scale self-hosted runners from 4GiB to 8GiB (node test OOM'd) - Restore BOT_TEST_WORKERS=2 (fits in 8GiB) - Revert labeler.yml to ubuntu-latest (GH_APP_PRIVATE_KEY org secret not available to self-hosted runners) * fix(test): use path outside tmpdir for media root rejection test On self-hosted runners the workspace is /tmp/runner/work/... which falls under os.tmpdir() (/tmp) — an allowed media root. The test was using process.cwd()/package.json which passed the root check, letting the Discord token validation fire first with a different error. Use /usr/share/ instead, which is never under an allowed root. * fix(ci): reduce Node test heap to 2048MB to avoid OOM on self-hosted runners 2 workers × 3072 MB = 6 GB V8 heap on 8 GB runners leaves no room for OS + Node RSS overhead, causing OOM kills during test cleanup. Reducing to 2048 MB per worker (4 GB total) leaves enough headroom. * fix(ci): reduce test workers to 1 per group to fit 12GB pod limit test-parallel.mjs runs 3 vitest groups in parallel, each spawning BOT_TEST_WORKERS processes. With 2 workers per group, total V8 heap was 3×2×3072MB = 18GB, causing OOMKilled on 16GB runner nodes. Reducing to 1 worker per group gives 3×3072MB = 9GB — fits in 12GB. * feat(ci): switch to ARC (Actions Runner Controller) for auto-scaling Replace manual myoung34/github-runner deployments with GitHub's official ARC controller. Runner pods now scale on-demand (0→10) when jobs queue, each getting its own 14Gi pod on the runner-pool node pool. Workers stay at 2 per vitest group with 2048MB heap (3×2×2048 = 12GB, fits in 14Gi). ---------
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
hanzo-k8slabel) on our DOKS clustergithub-runnerpods + 2github-runner-buildpods inactions-runner-systemnamespaceBOT_TEST_WORKERS=1for 4GiB runner pod memory limitsWhat's migrated
What stays on GitHub runners
Test plan
🤖 Generated with Claude Code