Skip to content

Hmbown/NemoHermes

NemoHermes

CI Release

NemoHermes is a local NVIDIA capability registry and routing layer for AI services, with integration support for Hermes Agent and NemoClaw, especially on DGX Spark.

It discovers local AI services, normalizes them into one registry, persists that registry on disk, and routes work by capability instead of raw ports or model strings.

At a glance

  • discover local NVIDIA-backed AI services
  • cache a merged capability registry on disk
  • route by role such as chat, vision, stt, and tts
  • inspect what NemoHermes would choose before wiring it into a larger system

Audience and scope

NemoHermes is for:

  • Hermes Agent users running local NVIDIA-backed services
  • NemoClaw users on DGX Spark or mixed Mac + Spark setups
  • operators who want one shared view of text, vision, speech, voice, training, fine-tuning, and hosting surfaces

NemoHermes is optimized for NVIDIA and DGX Spark environments rather than general-purpose backend integration. The repo keeps a NemoClaw-compatible manifest because that host can load it, but the project itself is the shared NVIDIA/Spark capability layer.

Current capabilities

  • standalone CLI via npx nemohermes
  • NemoClaw-compatible command registration via openclaw nemohermes ...
  • persisted registry cache at ~/.nemohermes/registry.json
  • local discovery probes for vLLM, SGLang, NIM, faster-whisper, and Piper
  • a static NVIDIA capability catalog covering text, vision, STT, TTS, V2V, training, fine-tuning, and hosting
  • role-based routing with modality and feature constraints such as streaming, tool calling, structured output, backend preference, and realtime preference

Alpha status

This is an early alpha aimed at two groups:

  • Hermes Agent users building local-first workflows on NVIDIA hardware
  • NemoClaw users on DGX Spark or Mac + Spark layouts

If you try it, the most useful feedback is concrete:

  • what services you had running
  • what NemoHermes discovered or missed
  • what route you expected
  • what route it actually picked
  • what friction remained in the DGX Spark or Mac + Spark workflow

Quick start

Install dependencies and verify the repo:

npm install
npm run verify

Then inspect the local environment:

npx nemohermes doctor
npx nemohermes discover
npx nemohermes models
npx nemohermes registry
npx nemohermes route --role vision --input text,image --output text --backend sglang --structured-output
npx nemohermes route --role stt --input audio --output text --realtime --json

If you are loading NemoHermes through a host that expects the compatibility manifest, use:

openclaw nemohermes discover
openclaw nemohermes models
openclaw nemohermes route --role tts --output audio --realtime

Registry cache

The registry cache exists so routing decisions stay useful even when you are not probing the machine on every call. By default NemoHermes:

  • writes the merged registry to ~/.nemohermes/registry.json
  • reuses the cache for 5 minutes
  • refreshes on discover
  • falls back to the cached registry if refresh fails and a cache exists

The standalone CLI also accepts environment overrides:

  • NEMOHERMES_PROFILE
  • NEMOHERMES_PREFER_LOCAL
  • NEMOHERMES_ALLOW_CLOUD_FALLBACK
  • NEMOHERMES_REGISTRY_PATH
  • NEMOHERMES_REGISTRY_MAX_AGE_MS

DGX Spark and Mac + Spark ergonomics

DGX Spark is the primary profile. The secondary profile is the common setup where a Mac is the control surface and Spark is the inference host. NemoHermes treats that pairing as a primary workflow: discovery, caching, routing, and operator handoff should make the split-machine setup easier to manage.

Docs

Verification

npm ci
npm run verify

That runs tests, lint, formatting checks, type-checking, a clean build, and npm pack --dry-run.