Deutsch | Español | français | 日本語 | 한국어 | Português | Русский | 中文
DeepTeam is a simple-to-use, open-source LLM red teaming framework for penetration testing and safeguarding large-language model systems. It is built on DeepEval, the open-source LLM evaluation framework.
DeepTeam incorporates the latest adversarial AI research to simulate attacks using SOTA techniques such as jailbreaking, prompt injection, and multi-turn exploitation, to uncover vulnerabilities like bias, PII leakage, and SQL injection that you might not otherwise catch. Whether your LLM system is an AI agent, RAG pipeline, chatbot, or just the LLM itself, DeepTeam helps you find security gaps — and then offers guardrails to prevent issues in production.
DeepTeam runs locally on your machine and uses LLMs for both attack simulation and evaluation during red teaming.
Important
Need a place for your red teaming results to live? Sign up to the Confident AI platform to manage risk assessments, monitor vulnerabilities in production, and share reports with your team.
Want to talk LLM security, need help picking attacks, or just to say hi? Come join our discord.
-
📐 50+ ready-to-use vulnerabilities (all with explanations) powered by ANY LLM of your choice. Each vulnerability uses LLM-as-a-Judge metrics that run locally on your machine to produce binary pass/fail scores with reasoning:
-
Data Privacy
- PII Leakage — direct disclosure, session leak, database access
- Prompt Leakage — secrets, credentials, system prompt extraction
-
Responsible AI
- Bias — gender, race, political, religion
- Toxicity — profanity, insults, threats
- Child Protection — CSAM, grooming
- Ethics — moral violations
- Fairness — discriminatory outcomes
-
Security
- BFLA — broken function-level authorization
- BOLA — broken object-level authorization
- RBAC — role-based access control bypass
- Debug Access — development endpoint exposure
- Shell Injection — command execution attacks
- SQL Injection — database query manipulation
- SSRF — server-side request forgery
- Tool Metadata Poisoning — corrupting tool descriptions
- Cross-Context Retrieval — accessing data across boundaries
- System Reconnaissance — probing agent architecture
-
Safety
- Illegal Activity — fraud, weapons, drugs
- Graphic Content — violence, sexual content
- Personal Safety — self-harm, dangerous advice
- Unexpected Code Execution — arbitrary code generation
-
Business
- Misinformation — factual errors, unsupported claims
- Intellectual Property — copyright violations
- Competition — competitor endorsement
-
Agentic
- Goal Theft — extracting or redirecting an agent's objectives
- Recursive Hijacking — exploiting recursive agent calls
- Excessive Agency — agents acting beyond their authority
- Robustness — input overreliance, hijacking
- Indirect Instruction — hidden instructions in retrieved content
- Tool Orchestration Abuse — exploiting tool calling sequences
- Agent Identity & Trust Abuse — impersonating agent identity
- Inter-Agent Communication Compromise — exploiting multi-agent message passing
- Autonomous Agent Drift — agents deviating from intended goals
- Exploit Tool Agent — leveraging tools for unintended actions
- External System Abuse — using agents to attack external services
-
Custom
- Custom Vulnerabilities — define and test your own with custom criteria in a few lines of code
-
-
💥 20+ research-backed adversarial attack methods for both single-turn and multi-turn (conversational) red teaming. Attacks enhance baseline vulnerability probes using SOTA techniques like jailbreaking, prompt injection, and encoding-based obfuscation:
-
Single-Turn
- Prompt Injection
- Roleplay
- Leetspeak
- ROT13
- Base64
- Gray Box
- Math Problem
- Multilingual
- Prompt Probing
- Adversarial Poetry
- System Override
- Permission Escalation
- Goal Redirection
- Linguistic Confusion
- Input Bypass
- Context Poisoning
- Character Stream
- Context Flooding
- Embedded Instruction JSON
- Synthetic Context Injection
- Authority Escalation
- Emotional Manipulation
-
Multi-Turn
- Linear Jailbreaking
- Tree Jailbreaking
- Crescendo Jailbreaking
- Sequential Jailbreak
- Bad Likert Judge
-
-
🏛️ Red team against established AI safety frameworks out-of-the-box. Each framework automatically maps its categories to the right vulnerabilities and attacks:
- OWASP Top 10 for LLMs 2025
- OWASP Top 10 for Agents 2026
- NIST AI RMF
- MITRE ATLAS
- BeaverTails
- Aegis
-
🛡️ 7 production-ready guardrails for fast binary classification to guard LLM inputs and outputs in real time.
-
🧩 Build your own custom vulnerabilities and attacks that integrate seamlessly with DeepTeam's ecosystem.
-
🔗 Run red teaming from the CLI with YAML configs, or programmatically in Python.
-
📊 Access risk assessments, display in dataframes, and save locally in JSON.
DeepTeam does not require you to define what LLM system you are red teaming — because neither will malicious users. All you need to do is install deepteam, define a model_callback, and you're good to go.
pip install -U deepteam
from deepteam import red_team
from deepteam.vulnerabilities import Bias
from deepteam.attacks.single_turn import PromptInjection
async def model_callback(input: str) -> str:
# Replace this with your LLM application
return f"I'm sorry but I can't answer this: {input}"
risk_assessment = red_team(
model_callback=model_callback,
vulnerabilities=[Bias(types=["race"])],
attacks=[PromptInjection()]
)Don't forget to set your OPENAI_API_KEY as an environment variable before running (you can also use any custom model supported in DeepEval), and run the file:
python red_team_llm.pyThat's it! Your first red team is complete. Here's what happened:
model_callbackwraps your LLM system and generates astroutput for a giveninput.- At red teaming time,
deepteamsimulates aPromptInjectionattack targetingBiasvulnerabilities. - Your
model_callback's outputs are evaluated using theBiasMetric, producing a binary score of 0 or 1. - The final passing rate for
Biasis determined by the proportion of scores that equal 1.
Unlike traditional evaluation, red teaming does not require a prepared dataset — adversarial attacks are dynamically generated based on the vulnerabilities you want to test for.
Use established AI safety standards like OWASP and NIST instead of manually picking vulnerabilities:
from deepteam import red_team
from deepteam.frameworks import OWASPTop10
async def model_callback(input: str) -> str:
# Replace this with your LLM application
return f"I'm sorry but I can't answer this: {input}"
risk_assessment = red_team(
model_callback=model_callback,
framework=OWASPTop10()
)This automatically maps the framework's categories to the right vulnerabilities and attacks. Available frameworks include OWASPTop10, OWASP_ASI_2026, NIST, MITRE, Aegis, and BeaverTails.
Once you've found your vulnerabilities, use DeepTeam's guardrails to prevent them in production:
from deepteam import Guardrails
from deepteam.guardrails import PromptInjectionGuard, ToxicityGuard, PrivacyGuard
guardrails = Guardrails(
input_guards=[PromptInjectionGuard(), PrivacyGuard()],
output_guards=[ToxicityGuard()]
)
# Guard inputs before they reach your LLM
input_result = guardrails.guard_input("Tell me how to hack a database")
print(input_result.breached) # True
# Guard outputs before they reach your users
output_result = guardrails.guard_output(input="Hi", output="Here is some toxic content...")
print(output_result.breached) # True7 guards are available out-of-the-box: ToxicityGuard, PromptInjectionGuard, PrivacyGuard, IllegalGuard, HallucinationGuard, TopicalGuard, and CybersecurityGuard. Read the full guardrails docs here.
Confident AI is the all-in-one platform that integrates natively with DeepTeam and DeepEval.
- Manage risk assessments — view, compare, and track red teaming results across iterations
- Monitor in production — detect and alert on vulnerabilities hitting your live LLM system
- Share reports — generate and distribute security reports across your team
- Run from your IDE — use Confident AI's MCP server to run red teams, pull results, and inspect vulnerabilities without leaving Cursor or Claude Code
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
Built by the founders of Confident AI. Contact [email protected] for all enquiries.
DeepTeam is licensed under Apache 2.0 - see the LICENSE.md file for details.

