Skip to main content
Build an AI Agent from Scratch with Python (No Frameworks)

Build an AI Agent from Scratch with Python (No Frameworks)

·2033 words·10 mins
Alejandro AO
Author
Alejandro AO

Software Engineer and Educator. Developer Advocate at Hugging Face 🤗

I help you build AI Apps that just work.

In this tutorial, we’re going to build an AI agent from scratch using only calls to the language model. No LangChain. No LlamaIndex. No smolagents. Just pure Python and API calls.

The goal here is not to build something production-ready. Instead, we want to understand what’s actually going on behind the scenes when you use an agent framework. Think of this as an introduction to how agents actually think and how they work.

By the end of this tutorial, you’ll have a working agent that can:

  • Receive a user query
  • Decide which tools to call
  • Execute those tools
  • Use the results to give you a complete answer

All of this with just a few lines of code. Let’s get started.

What is an Agent?
#

Before agents arrived, LLM applications used to look something like this:

You → Application → LLM → Application → You

You would send a query to your application. The application was essentially just an interface to talk to the language model. Your query went directly to the LLM, and the response came directly back to you. That was it.

This worked amazingly well. It revolutionized the world. This is how ChatGPT worked back in 2022.

However, language models are now capable of doing much more than just answering questions with their pre-trained material. They can now execute tools. And this is basically the definition of an agent:

An agent is an application where the LLM is responsible of changing the flow of the processes.

Or, another common definition:

An agent is an LLM application that runs in a loop and can execute tools.

Agent’s Sequence Diagram

In this new paradigm, the flow looks different. Let’s say you ask your agent: “What is the temperature in San Francisco?”

  1. You send your query to the application (agent)
  2. The agent forwards it to the language model along with a list of available tools
  3. The LLM decides it needs to use a tool (e.g., get_temperature)
  4. The LLM returns a tool call request (not the actual result!)
  5. Your application executes the tool
  6. Your application sends the tool result back to the LLM
  7. The LLM returns the final answer to the user

This is crucial to understand: the language model doesn’t execute the tools itself. It only decides which tools to use and with what parameters. The actual execution happens in your application. This is exactly what we’re going to implement.

Setup
#

We’ll use open-source models through Hugging Face’s Inference Providers. In this tutorial, I’m using Kimi-K2-Thinking, a powerful model from Moonshot AI that supports function calling.

First, install the required package:

pip install huggingface_hub

Then, get your Hugging Face token from huggingface.co/settings/tokens and set it up:

import os
import getpass

os.environ["HF_TOKEN"] = getpass.getpass("Hugging Face Token: ")

Now initialize the client:

from huggingface_hub import InferenceClient

client = InferenceClient(model="moonshotai/Kimi-K2-Thinking")

Note: When choosing a model, make sure it supports function calling. You can filter models on Hugging Face by “Inference Available” and check if they have function calling capabilities.

Basic LLM Call (No Tools)
#

Before we add tools, let’s make sure our basic setup works:

response = client.chat.completions.create(
    messages=[
        {"role": "user", "content": "are you sentient?"}
    ]
)

print(response.choices[0].message.content)

Output:

No, I am not sentient. I am an AI language model. I process and generate
text based on patterns in my training data, without any personal awareness,
beliefs, or emotions.

If you look at the full response object, you’ll see it has no tool calls:

response.choices[0].message.model_dump()
{
    'role': 'assistant',
    'content': 'No, I am not sentient...',
    'reasoning_content': '...',  # thinking tokens (for thinking models)
    'tool_calls': None  # no tools were called
}

At this point, we’re still in the old paradigm. The LLM just reasons and sends back a response. No tool execution. Let’s change that.

Creating Tools
#

Tools in agents are essentially just functions. In our case, we’ll create a simple function that gets the temperature of any city:

def get_temperature(city: str):
    """Get the current weather in a given city."""
    if city.lower() == "san francisco":
        return "72"
    if city.lower() == "paris":
        return "75"
    if city.lower() == "tokyo":
        return "73"
    return "70"

This is a mock function, but you could easily replace it with a real weather API call. That’s the beauty of tools: anything you can do in a function, your agent can do.

Tool Schemas
#

Here’s where it gets interesting. To let the LLM know which tools are available, we need to define a schema that describes each tool. The schema tells the LLM:

  • What the function is called
  • What it does (description)
  • What parameters it takes
  • What types those parameters are

Here’s what the schema looks like:

get_temperature_tool_schema = {
    "type": "function",
    "function": {
        "name": "get_temperature",
        "description": "Get the current temperature in a given city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "The city to get the temperature for.",
                }
            },
            "required": ["city"]
        }
    }
}

The description is very important. The better your descriptions, the more accurate tool calls your agent will have. Be specific about what each parameter expects.

Easier Schema Creation with Pydantic
#

Writing schemas by hand is tedious. Pydantic makes this much easier:

from pydantic import BaseModel, Field

class GetTemperatureArgs(BaseModel):
    city: str = Field(..., description="The city to get the temperature for.")

schema = {
    "type": "function",
    "function": {
        "name": "get_temperature",
        "description": "Get the current temperature in a given city.",
        "parameters": GetTemperatureArgs.model_json_schema()
    }
}

This generates the exact same schema with much less clutter.

Tool Calling with LLMs
#

Now let’s make a call to the LLM and give it access to our tool:

response = client.chat.completions.create(
    messages=[
        {"role": "user", "content": "what is the temperature in San Francisco today?"}
    ],
    tools=[schema],
    tool_choice="auto"  # Let the model decide when to call functions
)

The key additions here are:

  • tools: A list of tool schemas
  • tool_choice: Set to “auto” to let the model decide when to use tools

Let’s look at what the model returns:

response.choices[0].message.tool_calls[0].function.__dict__
{
    'arguments': '{"city":"San Francisco"}',
    'name': 'get_temperature',
    'description': None
}

The LLM didn’t return an answer. It returned a tool call request. It’s telling us: “I need you to call get_temperature with the argument city='San Francisco'.”

Now it’s our job to:

  1. Execute the function
  2. Send the result back to the LLM
  3. Get the final answer

This is exactly what an agent does. Let’s build one.

Building the Agent Class
#

Our agent needs three methods:

  • __init__: Initialize the agent with a client, system prompt, and tools
  • __call__: Allow calling the agent directly with a message
  • execute: The main loop that handles tool calls
import json

class Agent:
    def __init__(self, client: InferenceClient, system: str = "", tools: list = None) -> None:
        self.client = client
        self.system = system
        self.messages: list = []
        self.tools = tools if tools is not None else []
        if self.system:
            self.messages.append({"role": "system", "content": system})

    def __call__(self, message=""):
        if message:
            self.messages.append({"role": "user", "content": message})

        final_assistant_content = self.execute()

        if final_assistant_content:
            self.messages.append({"role": "assistant", "content": final_assistant_content})

        return final_assistant_content

    def execute(self):
        while True:
            completion = self.client.chat.completions.create(
                messages=self.messages,
                tools=self.tools,
                tool_choice="auto"
            )

            response_message = completion.choices[0].message

            if response_message.tool_calls:
                # Add the assistant's tool call message to history
                self.messages.append(response_message)

                tool_outputs = []
                for tool_call in response_message.tool_calls:
                    function_name = tool_call.function.name
                    function_args = json.loads(tool_call.function.arguments)

                    # Execute the tool
                    if function_name in globals() and callable(globals()[function_name]):
                        function_to_call = globals()[function_name]
                        executed_output = function_to_call(**function_args)
                        tool_output_content = str(executed_output)
                        print(f"Executing tool: {function_name} with args {function_args}, Output: {tool_output_content[:500]}...")

                    tool_outputs.append({
                        "tool_call_id": tool_call.id,
                        "role": "tool",
                        "name": function_name,
                        "content": tool_output_content,
                    })

                # Add all tool results to message history
                self.messages.extend(tool_outputs)

            else:
                # No more tool calls, return the final response
                return response_message.content

Let’s break down what’s happening in execute():

  1. Loop until we get a final answer: The while True loop keeps running until the LLM returns a response without tool calls.

  2. Call the LLM: We send all messages in history plus the available tools.

  3. Check for tool calls: If the response contains tool calls, we need to execute them.

  4. Execute each tool: We loop through all tool calls (there can be multiple!), execute each one, and collect the outputs.

  5. Add tool results to history: This is crucial. The tool responses must include the tool_call_id that matches the original request. If the IDs don’t match, the LLM will return an error.

  6. Repeat or return: If there were tool calls, we loop again with the updated history. If not, we return the final content.

About Message History
#

Notice how we’re maintaining a messages list. This list contains more than what the user sees:

  • System message
  • User messages
  • Assistant messages (including tool call requests)
  • Tool response messages

In a typical chat interface, users only see their messages and the final assistant responses. But internally, the agent maintains the full history including tool calls and responses. This context is essential for the LLM to understand what happened.

Running the Agent
#

Let’s put it all together:

agent = Agent(
    client=client,
    system="You are a helpful assistant that can answer questions using the provided tools.",
    tools=[get_temperature_tool_schema]
)

response = agent("what is the weather in san francisco?")
print(response)

Output:

Executing tool: get_temperature with args {'city': 'San Francisco'}, Output: 72...
The current temperature in San Francisco is 72 degrees Fahrenheit.

It works! The agent:

  1. Received our question
  2. Decided to use the get_temperature tool
  3. Executed the tool and got “72”
  4. Sent the result back to the LLM
  5. Returned a natural language answer

Inspecting Agent State
#

You can inspect the agent’s internal state:

# View message history
agent.messages
[
    {'role': 'system', 'content': 'You are a helpful assistant...'},
    {'role': 'user', 'content': 'what is the weather in san francisco?'},
    ChatCompletionOutputMessage(role='assistant', tool_calls=[...]),
    {'tool_call_id': 'get_temperature:0', 'role': 'tool', 'name': 'get_temperature', 'content': '72'},
    {'role': 'assistant', 'content': 'The current temperature in San Francisco is 72 degrees Fahrenheit.'}
]

You can see the full conversation flow, including the tool call and its response.

What’s Next?
#

This is a minimal implementation to help you understand what’s happening behind the scenes. In production, you would want:

  • Error handling: What if a tool fails? What if the API times out?
  • Retry logic: Implement retries for failed tool calls
  • Multiple tools: Our agent supports multiple tools, but we only used one
  • Streaming: Stream responses for better UX
  • Memory management: Truncate history to avoid context limits
  • Logging: Track what the agent is doing for debugging

Agent frameworks like smolagents, LangChain, and LlamaIndex handle all of this for you. But now you understand what they’re doing under the hood.

In future tutorials, we’ll explore how to use these frameworks to build more robust agents. But you’ll now have a much better understanding of what’s actually happening behind the scenes.

Full Code
#

Here’s the complete code for reference:

import os
import json
import getpass
from huggingface_hub import InferenceClient

# Setup
os.environ["HF_TOKEN"] = getpass.getpass("Hugging Face Token: ")
client = InferenceClient(model="moonshotai/Kimi-K2-Thinking")

# Tool definition
def get_temperature(city: str):
    """Get the current weather in a given city."""
    if city.lower() == "san francisco":
        return "72"
    if city.lower() == "paris":
        return "75"
    if city.lower() == "tokyo":
        return "73"
    return "70"

# Tool schema
get_temperature_tool_schema = {
    "type": "function",
    "function": {
        "name": "get_temperature",
        "description": "Get the current temperature in a given city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "The city to get the temperature for.",
                }
            },
            "required": ["city"]
        }
    }
}

# Agent class
class Agent:
    def __init__(self, client: InferenceClient, system: str = "", tools: list = None) -> None:
        self.client = client
        self.system = system
        self.messages: list = []
        self.tools = tools if tools is not None else []
        if self.system:
            self.messages.append({"role": "system", "content": system})

    def __call__(self, message=""):
        if message:
            self.messages.append({"role": "user", "content": message})
        final_assistant_content = self.execute()
        if final_assistant_content:
            self.messages.append({"role": "assistant", "content": final_assistant_content})
        return final_assistant_content

    def execute(self):
        while True:
            completion = self.client.chat.completions.create(
                messages=self.messages,
                tools=self.tools,
                tool_choice="auto"
            )
            response_message = completion.choices[0].message

            if response_message.tool_calls:
                self.messages.append(response_message)
                tool_outputs = []

                for tool_call in response_message.tool_calls:
                    function_name = tool_call.function.name
                    function_args = json.loads(tool_call.function.arguments)

                    if function_name in globals() and callable(globals()[function_name]):
                        function_to_call = globals()[function_name]
                        executed_output = function_to_call(**function_args)
                        tool_output_content = str(executed_output)
                        print(f"Executing tool: {function_name} with args {function_args}, Output: {tool_output_content[:500]}...")

                    tool_outputs.append({
                        "tool_call_id": tool_call.id,
                        "role": "tool",
                        "name": function_name,
                        "content": tool_output_content,
                    })

                self.messages.extend(tool_outputs)
            else:
                return response_message.content

# Run the agent
agent = Agent(
    client=client,
    system="You are a helpful assistant that can answer questions using the provided tools.",
    tools=[get_temperature_tool_schema]
)

response = agent("what is the temperature in San Francisco?")
print(response)

References
#