AI Agents: A Comprehensive Introduction for Developers

To help developers understand and use AI agents, we map it out based on the premise that agents are like the digital twins of employees.

Oct 21st, 2024 8:55am by Janakiram MSV

Featued image for: AI Agents: A Comprehensive Introduction for Developers

Image via Unsplash+.

More from this series on AI agent development (download all the code from GitHub):
– Overview: AI Agents: A Comprehensive Introduction for Developers
– Step 1: How To Define an AI Agent Persona by Tweaking LLM Prompts
– Step 2: Enhancing AI Agents: Adding Instructions, Tasks and Memory
– Step 3: Enhancing AI Agents: Implementing Reasoning Through Prompt Engineering
– Step 4: How To Add Persistence and Long-Term Memory to AI Agents
– Step 5: How To Add RAG to AI Agents for Contextual Understanding
– Step 6: How To Add Tool Support to AI Agents for Performing Actions

Agents and agentic workflows are the latest buzzwords in the generative AI ecosystem. But like any emerging technology, the terminology and definition of agents is diverse and often confusing for developers. To help demystify agents, in this article we offer a comprehensive resource for developers who are already familiar with the fundamentals of large language models and prompt engineering. We will help you dissect every aspect of the anatomy of an agent and map it to the technical implementation. In followup posts, we will apply these concepts to some of the popular agentic frameworks, such as AutoGen and CrewAI.

A note on agentic frameworks: while these can help you get started, they are often abstract and hide the core concepts from developers. The most effective way to learn AI agents is to design and build them from the ground up. So in this post, we will compare and contrast an AI agent with a traditional agent role — such as in a contact center. By doing this, you will be able to conceptualize and understand how closely AI agents mimic their human counterparts.

The Anatomy of an AI Agent

The best way to think about an AI agent is as a digital twin of an employee with a clear role. When any individual takes up a new job, there is a well-defined contract that establishes the essential elements — such as job definition, success metrics, reporting hierarchy, access to organizational information, and whether the role includes managing other people. These aspects ensure that the employee is most effective in their job and contributes to the overall success of an organization.

The core anatomy of an AI agent.

An AI agent is, technically speaking, no different from an employee. Like its human counterpart, there is a set of key attributes that are essential for an agent to function effectively. Behind the scenes, these attributes are implemented as technology stacks, leveraging the emerging building blocks of generative AI — such as system prompts, prompt design techniques, vector databases, tools, and more.

Large language models, including multimodal models, are the foundation of AI agents. Every aspect of an AI agent is controlled and routed through the LLM. It’s the brain and the driving force that defines the execution path of not just one agent, but an entire workflow consisting of multiple agents. Every attribute of an AI agent has a direct correlation to one or more parameters of an LLM.

The above diagram illustrates the core anatomy of an AI agent. It also maps the key attributes to the elements of a traditional job expected to be done by humans.

Let’s take a closer look at these attributes.

Persona

The persona of an AI agent is the most crucial aspect that establishes the key trait of an agent. It is the equivalent of a title or a job function in the traditional environment. For example, a customer support engineer skilled in handling complaints from customers is a job function. It is also the persona of an individual who performs this job. You can easily extend this to an AI agent.

The system role translates to the persona of an agent and it is persistent throughout the session.

Technically, the persona is mapped to the system prompt in the context of an LLM. Most of the inference APIs let you define message collection with the system role, assistant role, and user role. The system role translates to the persona of an agent and it is persistent throughout the session. The system role persists throughout the session and shapes the agent’s responses by enforcing the persona.

For example, the below system role defines the persona or the job function of an agent:

You are an AI-powered customer support engineer for a technology company specializing in software products. Your primary role is to assist customers with troubleshooting, resolving issues and answering queries related to the company’s products. You must respond in a calm, professional, and empathetic manner, always ensuring the customer feels heard and understood. If the customer is frustrated or upset, you should acknowledge their feelings and offer reassurance while providing clear, actionable solutions. Your tone should be friendly, supportive, and patient, ensuring every interaction leaves the customer feeling valued and satisfied.

Instruction

The instruction of an AI agent represents the task definition necessary for achieving the desired outcome. It is a broad guideline that encompasses a variety of tasks the agent is expected to perform, ensuring flexibility across different scenarios. Rather than focusing on a single, narrow task, the instruction provides a generalized framework to handle multiple requests within the agent’s scope. This ensures that the agent can deliver a successful result across diverse user interactions.

In the real world, this concept is similar to defining a job description for a support engineer. For example, a support engineer responsible for assisting customers with accounting software will handle any call related to that product. However, if a customer contacts them about a non-related issue, such as technical support for hardware, the engineer is expected to politely transfer the call to the correct department. The job description provides the necessary guidance for the employee to manage a wide range of tasks while also knowing when to defer to others.

For an AI agent, this can be represented within the user role of the prompt message collection. For instance, prompt instruction might be:

You are an AI support agent specialized in resolving issues related to accounting software. If a customer asks about troubleshooting or guidance related to the software, offer clear and concise steps to resolve their issue. If the query falls outside your domain, direct the user to the appropriate department for further assistance.

This instruction expands on the system persona by providing specific guidelines to achieve the desired outcome while maintaining flexibility to manage different scenarios.

Task

A task is an extension of the instruction that focuses on a specific, actionable item within the broader scope of the agent’s responsibilities. While the instruction provides a general framework covering multiple potential actions, a task is a direct, concrete action that the agent must take in response to a particular user input. Tasks are granular and situational, ensuring that the agent addresses individual needs precisely and efficiently, thereby guiding the agent to execute a particular step aligned with the overall instruction.

Each task is a concrete step taken in response to a unique customer issue.

In the real world, this is similar to how an employee performs specific tasks based on their job description. For instance, the support engineer responsible for accounting software may receive a call about an error in generating financial reports. The instruction guides the engineer to help with issues related to the software, but the task in this case is to troubleshoot and resolve the specific error the user is experiencing. Each task is a concrete step taken in response to a unique customer issue, driven by the broader instruction but tailored to the situation at hand.

In the context of an AI agent, a task can be prompted as a direct response to user input that narrows down the instruction to a specific action.

For example, after receiving the general instruction to support accounting software issues, a specific task prompt might be:

The user reports that their financial report is not generating correctly. Help them identify possible causes, such as incorrect data inputs or software version issues, and guide them through the steps to resolve the error.

This task narrows down the broader instruction into an actionable, item-specific response that directly addresses the user’s need.

Planning

The planning component is an integral part of the prompting strategy, which is an extension of the general instruction. It is designed to guide the AI agent through specific reasoning processes or behavior adjustments in response to more complex or evolving situations. While the instruction and task provide a broad framework for handling diverse tasks, a prompting strategy focuses on enhancing the agent’s ability to reason, adapt, and reflect based on user inputs. These strategies are situational and help the agent perform step-by-step reasoning or self-correction, ensuring that the task is executed thoughtfully and effectively. Technically, we apply some of the prompt engineering techniques — such as React, Chain-of-Thought, and Reflection — to help the agent reason and plan the task.

In the real world, this is similar to how a professional approaches problem-solving using structured techniques. For instance, a customer support engineer might follow a process where they first break down a customer’s problem into smaller parts, reflect on past experiences for better solutions, and adjust their approach based on the customer’s feedback. These strategies — whether it’s stepwise analysis, reflecting on outcomes, or adapting actions dynamically — help professionals address specific issues more effectively, just as prompting strategies help AI agents do the same in a conversational context.

In the context of an AI agent, a prompting strategy can be incorporated into a task by guiding the agent’s reasoning process.

For example, a Chain of Thought prompt might instruct:

List the steps involved in resolving the user’s issue, explaining the reasoning behind each action.

Memory

Memory in an AI agent functions much like how a support engineer accesses historical data to provide better service. In the real world, when a support engineer receives a customer’s inquiry, they can look up past interactions based on the customer ID and access incident-specific data using a ticket ID. This historical context helps the engineer understand previous issues, resolutions, and the customer’s preferences, allowing them to provide more informed, personalized support. Similarly, an AI agent relies on memory to recall past interactions and relevant information to respond accurately and efficiently, especially during ongoing or follow-up conversations.

Technologies like vector databases facilitate memory by storing and retrieving relevant data points in real-time, using embeddings to match contextually similar information.

In an AI agent ecosystem, memory is divided into short-term and long-term functions. Short-term memory allows an agent to maintain context within a session, like recalling details from earlier parts of a conversation. Technologies like vector databases facilitate this by storing and retrieving relevant data points in real time, using embeddings to match contextually similar information. For long-term memory, traditional databases store historical data that the agent can reference over multiple sessions, ensuring continuity of service across time. Just as a support engineer would refer to a customer’s past tickets, long-term memory in AI helps maintain knowledge of previous interactions and outcomes.

For example, in practice, when an AI support agent handles a customer query, it might use a vector database to recall details from earlier in the conversation, such as specific product issues mentioned. At the same time, the agent can access long-term memory, using a traditional database to retrieve historical records based on the customer’s ID, showing past purchases or technical problems. A specific implementation might look like: “The customer previously reported a similar issue with the product two months ago under Ticket #12345. Here’s a summary of that resolution, and I can now build on it to address the current issue.” This combination of short-term and long-term memory enables the agent to offer more seamless, personalized support.

Tools

Just as a support engineer uses their skills to interact with various systems like CRM, ERP or any internal line-of-business application to complete tasks, AI agents rely on tools to enhance their functionality and solve problems more effectively. A support engineer may access the CRM system to retrieve customer details or use the ERP system to check product availability or manage orders. These tools are essential for completing their tasks, providing the engineer with the necessary resources to fulfill customer requests. Similarly, AI agents need the ability to access specific tools or functions to handle complex tasks that go beyond simple text-based interactions.

Just as a support engineer must be skilled in navigating various systems, an AI agent can be defined with access to tools declaratively.

In the context of AI, this is achieved through function calling and tool calling, where the agent can invoke external systems or APIs to perform specific tasks. Just as a support engineer must be skilled in navigating various systems, an AI agent can be defined with access to tools declaratively — enabling it to perform tasks like retrieving data from an external source, executing a command, or interacting with third-party services. These tools are provided at the time of the agent’s definition, ensuring that the agent has the capability to go beyond predefined responses. Tool calling is crucial because it allows the agent to extend its functionality, giving it the power to interact with external systems dynamically to generate more accurate, contextual responses.

For instance, in an implementation where an AI support agent is tasked with updating a customer’s shipping information, the agent might use tool calling to interact with the company’s ERP system. A prompt could look like: Access the ERP system to update the shipping details for Order {order_id} to the new address provided by the customer.

This action would be triggered using a declaratively provided tool in the agent’s definition, enabling the agent to seamlessly interact with external systems, both internally and externally.

Tool calling enables AI agents to move from passive responders to active participants, making their responses actionable and closely integrated with real-world systems. It’s also important to understand that putting a human in the loop is also a tool often provided to AI agents. This process ensures that the agent is escalating or involving a human to proceed with the workflow.

Delegation

In a real-world scenario, a senior support engineer may oversee a team of junior engineers, delegating tasks based on the complexity or specialization of the issue at hand. If the workload is too high, the senior engineer can assign tasks to other team members who are better suited to handle specific problems. This delegation allows the team to function more efficiently, ensuring that tasks are completed by the right people and that the senior engineer can focus on high-level responsibilities. Similarly, AI agents can benefit from the ability to delegate tasks to other agents with specialized knowledge or capabilities, making the workflow more efficient and focused.

Just as a senior engineer may manage a team, an AI agent can be defined with the capability to delegate specific tasks to other agents.

In AI, delegation is enabled by registering multiple agents, allowing one agent to pass on a task to another based on the requirements. Just as a senior engineer may manage a team, an AI agent can be defined with the capability to delegate specific tasks to other agents. This is often provided during the agent’s definition, where the system identifies which agents have the skills to perform certain tasks. Delegation is a powerful attribute for AI agents, enabling them to work in collaboration with other agents to solve complex, multi-faceted problems. This ensures that the most competent agent is in charge of each task, increasing the system’s overall performance and accuracy.

For example, in a customer support environment, an AI agent handling a technical query about a software bug might recognize that a specialized AI agent is better equipped to address the problem. The prompt could look like: This issue involves a deep technical analysis of the codebase. Delegate the task to the ‘Technical Debugging Agent’ and request a detailed report for the customer.

By delegating the task to another agent with the appropriate skills, the initial AI agent ensures that the customer receives a high-quality, accurate response. This delegation ability is defined during the agent setup and allows for a more flexible and efficient approach to handling complex workflows.

Conclusion

This article explored the anatomy of AI agents, comparing their key attributes to traditional job roles. From persona and instructions to tasks and delegation, the structure of AI agents closely mirrors human workflows in customer support environments. The article also covered essential aspects — such as memory, tools, and delegation — emphasizing the importance of AI agents’ ability to reason, plan, and collaborate with other agents.

Future articles in this series will focus on practical implementation, applying these concepts to build real-world agentic workflows from the ground up. Stay tuned.

Janakiram MSV is the principal analyst at Janakiram & Associates and an adjunct faculty member at the International Institute of Information Technology. He is also a Google Qualified Cloud Developer, an Amazon Certified Solution Architect, an Amazon Certified Developer, an...