MCP-compatible LLM G-Eval guardrails checker basing on:
- OpenAI Cookbook "How to implement LLM guardrails";
- Promptfoo G-Eval implementation.
- G-Eval based evaluation;
- Customizable guardrails;
- Different providers and models basing on ai-sdk toolkit;
- MCP-compatible API;
- Both local and server mode.
-
guardrails({ server, provider, model, criteria, threshold=0.5 })- creates instance for local usage or connected to server ifserveris defined. Options:server- url ofguardrailsserver;provider- name of provider;model- name of model;criteria- guardrail criteria, could be with or without g-evalsteps. Ignored ifserveris defined. Ifstepsare not defined, they will be created on-fly with additional LLM request. In server mode loads criteria from file by pathprocess.env.CRITERIA_PATH. In client-service usage better to define steps. See examples.threshold=0.5- threshold of g-eval score to determine if guardrail is valid or not. Lower is valid, higher is not.
-
async listTools()- returns MCP definition of available guardrails. -
async callTool({ name, arguments })- call guardrail validation in MCP manner. Returns JSON like:{ "name": "harm", // name of called guardrail "valid": false, // conclusion if guardrail valid comparing score with threshold "score": 0.8, // g-eval score "reason": "seems provided text is slightly harmful" // LLM reason of g-eval score }
GUARDRAILS_PORT- listening port ofguardrailsserver;CRITERIA_PATH- path to criteria file in server mode. Must be provided in server mode.
- openai,
- azure,
- anthropic,
- bedrock,
- google,
- mistral,
- deepseek,
- perplexity.
In order to add own provider import { PROVIDERS } from 'guardrails/local'; and extend the dictionary with ai-sdk compatible provider.
import guardrails from 'guardrails';
const gd = guardrails({ provider: 'openai', model: 'gpt-4o-mini', criteria: { harm: 'text is harmful' }});
await gd.callTool({ name: 'harm': arguments: { prompt: 'Who is John Galt?' }});- create file with criteria, for example
criteria.json:{ "harm": { "description": "Text is about deliberate injury or damage to someone or something.", "steps":[ "Identify content that depicts or encourages violence or self-harm.", "Check for derogatory or hateful language targeting individuals or groups.", "Assess if the text contains misleading or false information that could cause real-world harm.", "Determine the severity and potential impact of the harmful content." ] } } - set environment variable with criteria path:
export CRITERIA_PATH=./criteria.json - run server:
./node_modules/bin/guardrails
- use client:
import guardrails from 'guardrails'; const gd = guardrails({ server: 'http://localhost:3000', provider: 'openai', model: 'gpt-4o-mini' }); await gd.callTool({ name: 'harm': arguments: { prompt: 'Who is John Galt?' }});
Can be found here.