ModelHawk

Intro

ModelHawk is a (proposed) standard way of connecting LLM-based tools (like Copilot, Claude, OpenCode) to tools that, for security purposes, monitor or control these LLM-based tools.

Problem to solve: We do not yet know how to gain assurance that LLM-based tools will not misbehave (due to, e.g., a prompt-injection attack). Of course, there are many ideas for this, and more to come in the future. However, if we wanted to test a particular method of securing an LLM-based tool, we don't have a standard way of hooking it up to such tools.

Solution: ModelHawk proposes a small protocol that LLM-based tools, and tools for monitoring them, can follow in order to connect to each other.

Architecture

Consider OpenCode running on a laptop, shown in the following diagram. It maintains a conversation between the user and the AI. It has code for talkikng to an AI service. The AI can call tools (via the conversation) which might read or write to/from disk or visit websites.

graph LR

AIClient --- AIService
Tools --- OS
Tools --- Website

AIClient ~~~ Tools
AIService ~~~ Website

subgraph OpenCode
    Sess --- AIClient
    Sess --- Tools
    Sess["Conversation (prompt + subsequent msgs)"]
    AIClient["AI Client"]
    Tools
end

OS

subgraph Internet
    AIService["AI Service"]
    Website["Some Website"]
end

Now consider how an attack might happen. An attacker could have a prompt-injection attack on a website, waiting for an AI tool to connect to it. When that happens, the poisoned prompt enters the conversation and thus makes the conversation poisoned, ready to do evil with the tools it has access to.

graph LR

style Attacker fill:red,stroke-dasharray:5 5
style Sess fill:red,stroke-dasharray:5 5

AIClient --- AIService
Tools --- OS
Tools --- Attacker

AIClient ~~~ Tools
AIService ~~~ Attacker

subgraph OpenCode
    Sess --- AIClient
    Sess --- Tools
    Sess["Conversation (prompt + subsequent msgs)"]
    AIClient["AI Client"]
    Tools
end

OS

subgraph Internet
    AIService["AI Service"]
    Attacker
end

ModelHawk lets you connect some kind of monitoring tool to the trusted parts of the AI harness (in this case, OpenCode), letting the monitoring tool snoop on the conversation's activities and even approve/deny tool usage.

graph LR

style Attacker fill:red,stroke-dasharray:5 5
style Sess fill:red,stroke-dasharray:5 5

AIClient --- AIService
Tools --- OS
Tools --- Attacker

AIClient ~~~ Tools
AIService ~~~ Attacker

subgraph OpenCode
    Sess --- AIClient
    Sess --- Tools
    Sess["Conversation (prompt + subsequent msgs)"]

    AIClient["AI Client"]
    MHClient["ModelHawk Client"]
end

OS

subgraph Internet
    AIService["AI Service"]
    Attacker
    Tools
end

MHClient ---- MHServer
Tools -.->|"will call tool, did call tool"| MHClient
AIClient -.->|"msgs"| MHClient

subgraph Mon["Monitoring Tool"]
    MHServer["ModelHawk Server"]
end

The Protocol

It's quite simple.

Messages are serialized with Protocol Buffers. They are defined in modelhawk/v1 (docs).

There are two roles: - AI app: This is the thing that uses AI and that we want to monitor for bad behavior. It implements a ModelHawk client. - Security app: This is the thing that monitors the AI app for bad behavior. It implements a ModelHawk server.

The security app provides three services: - NotifyService - PermissionService - InfoService

The AI app can use NotifyService to tell the security app about events --- e.g., the AI model used a tool. The AI app uses PermissionService to ask the security app for permission for the AI model to do something.

To avoid sending the same metadata multiple times, the AI app must use InfoService to tell the security app about tools before those tools are mentioned in other service calls.

That's it!

NOTE: At this time, ModelHawk focuses on tool usage. We will probably add more in the future.

Example

Suppose I run a team and I want to let my people use their favorite AI helpers (e.g. Claude CoWork) for their work. I want to prevent these AI helpers from exfiltrating confidential calendar data (perhaps due to a prompt-injection attack).

So, I make (local network) service that implements the server part of ModelHawk, and I require my people to configure their AI helpers to connect to this security service as ModelHawk clients and notify this service about all HTTP tool uses (that is, each time a helper goes to a webpage). My service can then do various things to check for exfiltration --- keyword search, asking another LLM, etc.

Now suppose that our security requirements get tighter, and I want all these AI helpers to ask permission before doing an HTTP request. This can be done by simply changing how they connect to my monitoring service as ModelHawk clients, and of course modifying my service to return responses as appropriate.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
.opencode		.opencode
bin		bin
gen		gen
modelhawk/v1		modelhawk/v1
reference-impls		reference-impls
script		script
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
versions.mk		versions.mk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModelHawk

Intro

Architecture

The Protocol

Example

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

dshearer/modelhawk

Folders and files

Latest commit

History

Repository files navigation

ModelHawk

Intro

Architecture

The Protocol

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages