Agent Vector Protocol (AVP)

Multi-agent text handoffs discard KV-cache, embeddings, and attention state the previous agent already computed. AVP transfers that state directly — zero tokens between agents, 2-3x faster pipelines, same or better accuracy, across models and families.

Overview

Agent Vector Protocol (AVP) is a binary protocol for LLM agent communication via latent representations. When two agents run the same model, AVP lets them exchange hidden states and KV-cache directly, skipping autoregressive text generation entirely. When agents run different models -- same family or different families -- AVP uses vocabulary-mediated projection to bridge between their latent spaces with zero training. When no compatible projection path exists, agents fall back to JSON.

AVP is transport-agnostic -- it defines the binary format, handshake, and codec, not the transport. The reference implementation uses HTTP/2, but AVP messages can be carried over A2A, MCP, gRPC, WebSockets, or any channel that supports binary payloads. AVP handles the latent communication layer, not discovery or orchestration.

How It Works

Handshake -- Agents exchange model identity (architecture, dimensions, weight hash, tokenizer hash)
Resolve -- Same model: latent mode. Same family: cross-model projection. Otherwise: JSON fallback.
Communicate -- Latent mode: binary tensor payloads. Cross-model: projected hidden states. JSON mode: text messages.

What Latent Mode Skips

In a standard agent-to-agent exchange, each message requires full autoregressive generation (token-by-token decoding). For same-model agents, this is redundant -- the receiving agent already operates in the same representation space. AVP eliminates this step by transmitting intermediate hidden states and KV-cache directly.

Binary Format

AVP uses a compact 12-byte header followed by protobuf metadata and raw tensor bytes:

Bytes 0-1:   Magic (0x4156 = "AV")
Byte 2:      Version (0x01)
Byte 3:      Flags (compressed, has_map, kv_cache)
Bytes 4-7:   Payload length (uint32 LE)
Bytes 8-11:  Metadata length (uint32 LE)
Bytes 12..N: Protobuf metadata
Bytes N..:   Raw tensor bytes

Documentation

Status

Version: 0.4

Current scope: same-model latent communication and cross-model communication via vocabulary-mediated projection (Rosetta Stone v2). Same-family models project through shared vocabulary; cross-family models project through overlapping BPE tokens (~85% overlap for Qwen/Llama). The core SDK depends only on numpy -- torch and engine libraries are optional.

Implementation

Python SDK -- pip install avp (v0.4.2). Easy API (think()/generate()), connector API (HuggingFaceConnector, LlamaCppConnector, OllamaConnector, VLLMConnector), cross-model via source= + cross_model=True, ContextStore, per-transfer quality gate, observability metrics, codec, handshake, session management, realignment, KV-cache serialization, Rosetta Stone cross-model projection, framework integrations (LangChain, CrewAI, AutoGen), HTTP/2 transport, 7 benchmark suites (541 tests). Core depends only on numpy; engine backends are optional extras ([hf], [llamacpp], [ollama], [vllm]).

Ecosystem

AVP is complementary to existing agent protocols and inference engines:

A2A -- AVP provides a transport binding for A2A via multipart/related with binary payloads
MCP -- MCP handles tools and context; AVP handles tensor transfer between agents
HuggingFace Transformers -- Full hidden state and KV-cache access for development and benchmarking (pip install avp[hf])
vLLM -- Text generation via VLLMConnector; latent transfer via KVConnectorBase_V1 plugin and model plugins for 4 architectures (pip install avp[vllm])
llama.cpp -- Full latent pipeline on GGUF-quantized models via embeddings API (pip install avp[llamacpp])
Ollama -- Auto-resolves Ollama model names to GGUF, auto-unloads to free VRAM, inherits full latent pipeline (pip install avp[ollama])
LangChain / CrewAI / AutoGen -- Framework integrations with latent think/generate roles

Research Foundation

Built on LatentMAS: Latent Collaboration in Multi-Agent Systems -- same-model latent communication via hidden state transfer and KV-cache sharing, with realignment for untied-weight models. Extended with cross-model vocabulary-mediated projection (novel -- zero training, works across model families).

Contributing

See CONTRIBUTING.md

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
examples		examples
protocol		protocol
rfcs		rfcs
schemas		schemas
.editorconfig		.editorconfig
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SPECIFICATION.md		SPECIFICATION.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Vector Protocol (AVP)

Overview

How It Works

What Latent Mode Skips

Binary Format

Documentation

Status

Implementation

Ecosystem

Research Foundation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Agent Vector Protocol (AVP)

Overview

How It Works

What Latent Mode Skips

Binary Format

Documentation

Status

Implementation

Ecosystem

Research Foundation

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages