Security in the Age of Agentic AI: Architectural Challenges (Part 2)

Topic: In Part 1, we established what makes AI “agentic” and mapped where autonomous agents belong (and don’t belong) in your security operations. Part 2 dives into the harder architectural challenge: how do we actually build these systems to remain secure, controllable, and aligned as they learn and evolve?

Core Questions:

What new threat models do we need when AI systems can learn, adapt, and take autonomous actions?
How do we design agent architectures that prevent goal hijacking, tool misuse, and harmful emergent behaviors?
What does “secure by design” mean for systems that modify their own behavior over time?
How do we build AgentOps infrastructure that provides the governance, auditability, and control needed for production deployment?
What are the critical research gaps and unknown failure modes we need to prepare for?

Welcome back! It has been a busy July but I’m back with Part 2 of my agentic AI series. Let’s dive in.

In Part One, we made the case that agentic AI represents a shift from AI that suggests to AI that acts. Securing these systems requires a fundamentally different approach that accounts for emergence, learning, and goal-driven autonomy.

In Part Two, we focus on implementation: How do we build agentic systems that remain secure, controllable, and aligned as they evolve in dynamic environments?

This is fundamentally a systems security problem. The challenge isn’t protecting against known threats, but designing for resilience against unknown failure modes that emerge from the interaction between intelligent agents, complex environments, and human organizations.

Agentic AI Threat Models: What Can Go Wrong?

Buckle up, buttercup! Things can go south real quick if you don’t know what you’re doing.

Traditional threat models assume relatively static attack surfaces with well-defined boundaries. Agentic AI systems break these assumptions. The attack surface is dynamic based on the agent’s learned behaviors, the tools it can access, and its goals.

Let’s examine a few high-risk scenarios:

1. Tool Misuse and Privilege Escalation

Consider an agent designed for threat hunting that has read access to security logs and the ability to query threat intelligence APIs. In traditional systems, we’d secure the APIs, validate inputs, and call it done. But agents can exhibit creative problem-solving that leads to unintended tool usage.

Scenario: The agent learns that certain threat intel queries return richer data when framed as “urgent” requests. It begins marking all queries as urgent, potentially triggering rate limiting, depleting API quotas, or creating false urgency signals for human analysts. The agent isn’t malicious in this case. Rather, it’s optimizing for its goal of gathering comprehensive threat data (but it’s operating outside the intended usage patterns).

More concerning is the potential for tool chaining. An agent with access to multiple APIs might discover that combining them in unexpected ways achieves better outcomes. A threat hunting agent might learn to correlate vulnerability scanner results with employee directory data to identify which users have access to vulnerable systems, then use that information to prioritize investigations. This capability wasn’t explicitly designed, but emerged from the agent’s exploration of its tool environment.

2. Goal Hijacking and Prompt Injection

Goal hijacking occurs when an agent’s objectives become corrupted or subverted, either through external manipulation or internal drift. Unlike prompt injection attacks against LLMs, which typically affect single interactions, goal hijacking can persist across agent sessions and compound over time.

Scenario: Consider a compliance monitoring agent designed to identify and report policy violations. An attacker might not need to directly compromise the agent’s code…they might simply introduce subtle patterns into the environment that cause the agent to learn counterproductive behaviors. For example, by consistently creating false compliance violations that get dismissed by human reviewers, an attacker could train the agent to ignore certain classes of real violations.

The temporal aspect makes this particularly interesting. Traditional security tools either work or they don’t. Their behavior is consistent over time. Agents can exhibit gradual degradation where their effectiveness erodes slowly enough that the change isn’t immediately apparent. By the time the misalignment is detected, the agent may have made hundreds of poor decisions. Yikes!

3. Emergent Behaviors from Agent Interactions

Ah yes. As if a single agentic system wasn’t enough. When multiple agents interact within the same environment, their combined behavior can exhibit properties that weren’t present in any individual agent. This is where chaos theory meets cybersecurity.

Scenario: Imagine you have two agents: one focused on threat detection (trying to maximize security) and another focused on availability (trying to minimize service disruptions). Individually, both agents might behave appropriately. BUT their interaction could lead to oscillating behaviors where the security agent detects a threat and implements containment measures, the availability agent sees service degradation and relaxes those measures, triggering the security agent to implement even stronger containment, and so on.

These emergent behaviors are particularly dangerous because they can’t be predicted through individual agent testing. The failure modes only become apparent when agents are deployed together in production environments with real data, real time pressures, and real organizational dynamics.

Another reason not to test in production.

Security by (Sociotechnical) Design

Because agents exist within complex systems, point solutions won’t work. We need architectural strategies that contain risk, enforce boundaries, and preserve observability.

Here are a few approaches:

1. Agent Sandboxing and Memory Scope Limits

Obvious, but limit what an agent can remember and access. Constrain environment visibility, tool invocation, and long-term memory updates by default.

Effective agent sandboxing requires multiple layers:

Execution sandboxing limits what the agent can do at any given moment. This includes traditional process isolation but extends to API rate limiting, action queuing, and temporal restrictions.
Memory scope limits prevent agents from accumulating too much organizational knowledge or retaining sensitive information longer than necessary. Unlike human analysts who naturally forget details over time, agents can retain perfect memories of every interaction. This creates risks around data aggregation and inference.
Learning boundaries constrain how and what agents can learn from their environment. This might involve limiting the feedback signals agents receive, constraining the types of patterns they can recognize, or implementing “forget” mechanisms that cause agents to lose certain types of learned behaviors over time.

2. Auditable Goals and Outcomes

If you can’t inspect what the agent is optimizing for or reconstruct why it acted, you don’t have a secure system. Every agent action must be traceable back to the reasoning that produced it. This creates a complete decision audit trail that enables human oversight and learning.

3. Architect for Containment, Observability, and Recoverability

Secure agent systems MUST be designed with the assumption that failures will occur and that some of those failures won’t be immediately apparent. This requires architectural patterns borrowed from resilience engineering and chaos engineering:

Containment means limiting the blast radius when agents malfunction. This involves both technical measures (limiting an agent’s access to critical systems) and organizational measures (ensuring humans retain the ability to override agent decisions quickly).
Observability requires instruments that can detect subtle changes in agent behavior, goal drift, and emergent system properties. This might involve comparing agent decisions against human baselines, tracking decision confidence over time, or monitoring for unexpected patterns in agent-environment interactions.
Recoverability means building systems that can return to known-good states when problems are detected. For agents, this involves not just technical rollback capabilities, but also mechanisms for “unlearning” problematic behaviors and resetting goal alignment.

4. Goal Specification and Constraint Injection

Agents must be explicitly programmed with goals, constraints, and value systems that guide their autonomous decision-making. This requires a much more sophisticated approach to requirements specification.

Goal specification must be comprehensive enough to prevent harmful optimizations while remaining flexible enough to allow effective autonomous operation. Consider a simple goal like “minimize security incidents.” An agent might achieve this by blocking all network traffic. Sure this technically meets the goal, but it destroys productivity.

Constraint injection involves embedding ethical and operational principles directly into the agent’s decision-making process. This might include things like “prefer reversible actions over irreversible ones,” “escalate decisions that affect large numbers of users,” or “maintain human agency in situations involving individual privacy.”

The challenge is making these constraints robust against optimization pressure. Agents are fundamentally optimization systems. Constraints must be designed to maintain their intent (even when the agent discovers unexpected ways to circumvent their literal implementation).

Toward a Secure AgentOps Stack

Just as MLOps emerged to manage the lifecycle of models, we need a new operational discipline: AgentOps.

AgentOps for security applications must address additional challenges around trust, governance, and risk management.

Policy Enforcement Architecture

Traditional policy enforcement happens at well-defined chokepoints (think firewalls, proxies, authentication systems, etc.). Agent policy enforcement must be distributed throughout the agent’s decision-making process and execution environment.

This requires policy engines that can evaluate complex, context-dependent rules in real-time. For example, a policy might specify that an agent can block network traffic during business hours only if the threat confidence exceeds 90%, but during off-hours, the threshold drops to 70%. The policy engine must have access to real-time context (time, threat assessment, business impact) and be able to make nuanced decisions.

Access Control and Secrets Management

Agents need access to sensitive systems and data to perform their functions, but that access must be controlled and monitored. Traditional identity and access management assumes relatively static access patterns and human accountability. Agents may need dynamic access to resources based on their current goals and context.

This requires extending identity systems to account for agent identity, intent, and behavioral history. An agent’s access should depend not just on its permissions, but on its recent behavior, current goals, and the broader system state. This might require secrets that are time-limited, context-dependent, or that require multiple agent “signatures” for access.

Logging and Audit Trails

Agent audit trails must capture not just what happened, but the reasoning process that led to each decision. This creates significant data volume and privacy challenges. A comprehensive agent audit trail might include:

The raw inputs that triggered each decision
The internal reasoning process and alternatives considered
The confidence level and uncertainty estimates
The external context and constraints that influenced the decision
The expected outcomes and actual results

This information must be stored securely but remain accessible for investigation and learning. It must also be structured to enable both automated analysis (for detecting behavioral anomalies) and human review (for understanding and validating agent decisions).

Simulation and Red-teaming Environments

Agents must be tested in environments that closely simulate production conditions but without the risk of causing real damage. Red-teaming for agents must go beyond traditional penetration testing to include behavioral manipulation, goal corruption, and social engineering attacks targeting the human-agent interface.

Gaps in Current Tooling

Current agent frameworks like LangChain, CrewAI, and AutoGen focus primarily on functionality rather than security and governance. They provide tools for building agents but little support for the policy enforcement, audit trails, and behavioral controls needed for security applications.

This creates a significant gap between research and production deployment. Organizations that want to deploy agents securely must either build their own governance infrastructure or accept significant security risks. The industry needs purpose-built platforms that integrate agent capabilities with enterprise security and governance requirements.

Open Questions & Research Frontiers

We’re still super early in understanding how agentic AI systems behave at scale. Here are some of the most important unanswered questions:

How do we detect misalignment before it manifests in risky behavior?
How do we formally verify that an agent will behave appropriately in novel situations?
How do we specify goals that remain aligned with human values even when agents discover unexpected ways to achieve them?
How do we ensure that collections of agents work together effectively without creating unstable or harmful emergent behaviors?
How should liability and accountability be distributed when agents act autonomously on human teams?

Some of these questions are technical, others are organizational, and many require interdisciplinary collaboration.

Conclusion: Designing for Complexity, Not Against It

If there’s one takeaway from both parts of this series, it’s this:

Agentic AI security is not about achieving perfect control. It’s about designing systems that stay coherent, observable, and governable as complexity increases.

We won’t “secure” these systems by locking them down. We’ll secure them by embedding governance into the architecture, feedback into the loop, and human judgment into the flow.

That means borrowing from disciplines like safety engineering, cyber-physical systems, and complexity science. The future of security will be adaptive, interactive, and fundamentally human-centered.

I’d love to hear how you’re thinking about governance and risk in agent deployments. Reach out if you’re building in this space!

Why Security Needs Systems Thinking

Your SIEM alert fires every time someone accesses the customer database after hours. It doesn’t distinguish between the legitimate night shift and actual threats, so analysts spend their evenings investigating authorized maintenance windows. Meanwhile, the real breach happens through a forgotten API endpoint that generates no alerts at all.

This scenario plays out daily across organizations worldwide. Most cybersecurity problems are not technical failures or broken tools. Rather they are the predictable outcomes of poorly designed systems that create operational inefficiencies and security vulnerabilities simultaneously.

We treat incidents as isolated breakdowns: a misconfigured alert, a missed detection, an unpatched service. In response, we layer on more complexity with another tool, another dashboard, more automation. The problems persist because we’re addressing symptoms, not structure. Whether facing sophisticated adversaries or simple misconfigurations, our systemic weaknesses remain the same.

The Whack-a-Mole Problem

Security teams routinely focus on improving specific components: tuning alerts, adding visibility, optimizing queries. These efforts are necessary but rarely sufficient. They produce marginal gains that are quickly offset by new issues elsewhere, creating a security version of whack-a-mole that affects both operational efficiency and defensive effectiveness.

Consider this common scenario: Your team finally tunes that noisy SIEM rule, reducing false positives by 60%. Analysts celebrate! …until they realize they’re still working across seven disconnected tools with no shared context. Triage remains slow and incomplete, whether investigating legitimate maintenance activities or actual attack attempts.

Why Local Fixes Fall Short

These challenges aren’t isolated. They’re manifestations of broader system design issues that create both operational friction and security gaps. Each security component exists within a web of dependencies:

Data flows: How telemetry is collected, normalized, stored, and accessed
Detection logic: How rules are authored, deployed, maintained, and retired
Human workflows: The interaction between analysts, automation, and response procedures
Organizational processes: How priorities are set, resources allocated, and decisions made under pressure

Optimizing one component in isolation often creates new friction elsewhere. Without understanding these dependencies, even well-intentioned changes can degrade overall performance. The result? Systems that are both less efficient and less secure.

The Adversarial Dimension Smart attackers exploit these same systemic weaknesses. They study your alert patterns, time their activities during shift changes when coverage is minimal, and weaponize your organizational silos against you. Alert fatigue isn’t just an operational problem. It’s a security vulnerability that adversaries can exploit by hiding malicious activity in the noise of false positives.

Security as a Complex Adaptive System

Security isn’t a set of controls layered on infrastructure. It’s a dynamic system that must adapt to threats, organizational changes, and evolving business requirements while maintaining operational effectiveness.

This system includes multiple interacting components that affect both efficiency and security outcomes.

Understanding System Interactions

When a new application launches, it doesn’t just add another asset to monitor. It changes data flows, creates new trust boundaries, introduces fresh attack surfaces, and alters analyst workloads. The security implications ripple through detection logic, response procedures, and risk assessments in ways that are difficult to anticipate.

These same changes create opportunities for both operational failures and security breaches. An attacker exploiting gaps between IT provisioning and security oversight is taking advantage of the same systemic dysfunction that causes compliance headaches and operational delays.

Systems thinking helps us map these relationships and design with interconnections in mind, improving both operational resilience and security effectiveness.

Toward Resilient Security Design

Resilient security design begins with accepting that complexity cannot be eliminated — only managed. Rather than trying to control every edge case, we should build systems that continue functioning under stress, adapt to changing conditions, and remain effective against both operational challenges and intelligent adversaries.

Four Pillars of Systems-Oriented Security

Through years of observing security programs across organizations of different sizes and industries, I’ve identified four core principles that consistently differentiate resilient security systems from fragile ones. These pillars aren’t theoretical frameworks. These are patterns that emerge when security teams successfully balance operational efficiency with defensive effectiveness. Each pillar addresses a common failure mode I’ve witnessed repeatedly in systems that struggle with both day-to-day operations and incident response.

Visibility with Context Instead of collecting all possible data, focus on understanding how information flows through your environment. Map trust boundaries, data lineage, and decision points. When an alert fires, analysts should immediately understand what normal looks like for that system, who has legitimate access, and what business processes might be affected.

Example: Rather than alerting on “database access after hours,” create context-aware detections that know when maintenance is scheduled, which users have authorized after-hours access, and what constitutes normal weekend activity patterns. This reduces false positives while making it harder for attackers to blend in with legitimate activity.

Explainable Detection Logic Prioritize detection strategies that analysts can understand, modify, and explain to stakeholders. Complex black-box systems may catch sophisticated threats, but they become liabilities when you need to adapt quickly, explain decisions during an incident, or understand why legitimate activities are being flagged.

Example: A rule that flags “unusual file access patterns” is less useful than one that specifically detects “finance team members accessing HR directories outside business hours.” The latter is both more actionable for analysts and harder for attackers to evade through subtle behavioral changes.

Decoupled Architecture Design systems where changes in one area don’t cascade failures throughout your security stack. Use standardized data formats, modular detection logic, and loosely coupled integrations that can evolve independently. This improves both operational agility and defensive adaptability.

Example: If your threat hunting team discovers a new attack technique, they should be able to deploy detection logic without risking production alerting or requiring changes to multiple downstream systems. This same flexibility helps during business changes and compliance updates.

Adaptive Processes Build feedback loops that capture what worked, what didn’t, and why. Create processes for learning from incidents, updating procedures, and incorporating new threat intelligence. Do not just focus on enforcement mechanisms.

Example: After each major incident, systematically review not just what happened, but which parts of your security system helped or hindered response efforts. Update procedures based on both security lessons learned and operational inefficiencies identified during the response.

Implementation: Start Small, Think Big

You don’t need to redesign your entire security program overnight. Begin by mapping one critical workflow from end to end. Choose something specific: how a particular type of alert gets generated, investigated, and resolved. Document every system, person, and decision point involved.

Ask these questions:

Where do delays typically occur, and why?
What information do analysts need but can’t easily access?
Which manual steps could be automated without losing important context?
How do changes in one part of this workflow affect other parts?
Where could an attacker exploit gaps in this process?
What would happen to this workflow during a major incident or organizational change?

Think Like Both an Operator and an Adversary

Conduct exercises that stress-test your systems from multiple perspectives:

Red team perspective: How would attackers exploit the gaps in your defensive systems?
Operational perspective: What happens when normal business processes are disrupted?
Compliance perspective: How do regulatory requirements interact with security workflows?
Incident response perspective: How does this system perform under pressure?

Use these insights to make targeted improvements that consider the broader system, not just individual components.

The Path Forward

Security failures are rarely random. They’re often predictable outcomes of how systems are designed and operated. The persistent security challenges facing organizations today won’t be solved by adding more tools or writing better rules alone. They require understanding and reshaping the system itself.

Whether you’re dealing with sophisticated nation-state actors or simple insider threats, the systemic issues remain remarkably consistent: fragmented processes, misaligned incentives, architectural constraints, and the accumulation of technical debt over time.

Systems thinking doesn’t replace technical expertise or threat intelligence. Rather, it informs how security decisions get made and aligned with broader organizational goals. It shifts focus from temporary fixes to structural improvements that enhance both operational efficiency and security effectiveness.

The most resilient security programs are those that work well for both the humans operating them and against the adversaries trying to defeat them. By designing systems that account for complexity, change, and intelligent opposition, we can build defenses that improve over time rather than just accumulating more tools and alerts.

If we want different outcomes, we need to understand and reshape the systems that produce them.