In this tutorial, we’re going to build an AI agent from scratch using only calls to the language model. No LangChain. No LlamaIndex. No smolagents. Just pure Python and API calls.
The goal here is not to build something production-ready. Instead, we want to understand what’s actually going on behind the scenes when you use an agent framework. Think of this as an introduction to how agents actually think and how they work.
By the end of this tutorial, you’ll have a working agent that can:
- Receive a user query
- Decide which tools to call
- Execute those tools
- Use the results to give you a complete answer
All of this with just a few lines of code. Let’s get started.
What is an Agent?#
Before agents arrived, LLM applications used to look something like this:
You → Application → LLM → Application → You
You would send a query to your application. The application was essentially just an interface to talk to the language model. Your query went directly to the LLM, and the response came directly back to you. That was it.
This worked amazingly well. It revolutionized the world. This is how ChatGPT worked back in 2022.
However, language models are now capable of doing much more than just answering questions with their pre-trained material. They can now execute tools. And this is basically the definition of an agent:
An agent is an application where the LLM is responsible of changing the flow of the processes.
Or, another common definition:
An agent is an LLM application that runs in a loop and can execute tools.

In this new paradigm, the flow looks different. Let’s say you ask your agent: “What is the temperature in San Francisco?”
- You send your query to the application (agent)
- The agent forwards it to the language model along with a list of available tools
- The LLM decides it needs to use a tool (e.g.,
get_temperature) - The LLM returns a tool call request (not the actual result!)
- Your application executes the tool
- Your application sends the tool result back to the LLM
- The LLM returns the final answer to the user
This is crucial to understand: the language model doesn’t execute the tools itself. It only decides which tools to use and with what parameters. The actual execution happens in your application. This is exactly what we’re going to implement.
Setup#
We’ll use open-source models through Hugging Face’s Inference Providers. In this tutorial, I’m using Kimi-K2-Thinking, a powerful model from Moonshot AI that supports function calling.
First, install the required package:
pip install huggingface_hub
Then, get your Hugging Face token from huggingface.co/settings/tokens and set it up:
import os
import getpass
os.environ["HF_TOKEN"] = getpass.getpass("Hugging Face Token: ")
Now initialize the client:
from huggingface_hub import InferenceClient
client = InferenceClient(model="moonshotai/Kimi-K2-Thinking")
Note: When choosing a model, make sure it supports function calling. You can filter models on Hugging Face by “Inference Available” and check if they have function calling capabilities.
Basic LLM Call (No Tools)#
Before we add tools, let’s make sure our basic setup works:
response = client.chat.completions.create(
messages=[
{"role": "user", "content": "are you sentient?"}
]
)
print(response.choices[0].message.content)
Output:
No, I am not sentient. I am an AI language model. I process and generate
text based on patterns in my training data, without any personal awareness,
beliefs, or emotions.
If you look at the full response object, you’ll see it has no tool calls:
response.choices[0].message.model_dump()
{
'role': 'assistant',
'content': 'No, I am not sentient...',
'reasoning_content': '...', # thinking tokens (for thinking models)
'tool_calls': None # no tools were called
}
At this point, we’re still in the old paradigm. The LLM just reasons and sends back a response. No tool execution. Let’s change that.
Creating Tools#
Tools in agents are essentially just functions. In our case, we’ll create a simple function that gets the temperature of any city:
def get_temperature(city: str):
"""Get the current weather in a given city."""
if city.lower() == "san francisco":
return "72"
if city.lower() == "paris":
return "75"
if city.lower() == "tokyo":
return "73"
return "70"
This is a mock function, but you could easily replace it with a real weather API call. That’s the beauty of tools: anything you can do in a function, your agent can do.
Tool Schemas#
Here’s where it gets interesting. To let the LLM know which tools are available, we need to define a schema that describes each tool. The schema tells the LLM:
- What the function is called
- What it does (description)
- What parameters it takes
- What types those parameters are
Here’s what the schema looks like:
get_temperature_tool_schema = {
"type": "function",
"function": {
"name": "get_temperature",
"description": "Get the current temperature in a given city.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city to get the temperature for.",
}
},
"required": ["city"]
}
}
}
The description is very important. The better your descriptions, the more accurate tool calls your agent will have. Be specific about what each parameter expects.
Easier Schema Creation with Pydantic#
Writing schemas by hand is tedious. Pydantic makes this much easier:
from pydantic import BaseModel, Field
class GetTemperatureArgs(BaseModel):
city: str = Field(..., description="The city to get the temperature for.")
schema = {
"type": "function",
"function": {
"name": "get_temperature",
"description": "Get the current temperature in a given city.",
"parameters": GetTemperatureArgs.model_json_schema()
}
}
This generates the exact same schema with much less clutter.
Tool Calling with LLMs#
Now let’s make a call to the LLM and give it access to our tool:
response = client.chat.completions.create(
messages=[
{"role": "user", "content": "what is the temperature in San Francisco today?"}
],
tools=[schema],
tool_choice="auto" # Let the model decide when to call functions
)
The key additions here are:
tools: A list of tool schemastool_choice: Set to “auto” to let the model decide when to use tools
Let’s look at what the model returns:
response.choices[0].message.tool_calls[0].function.__dict__
{
'arguments': '{"city":"San Francisco"}',
'name': 'get_temperature',
'description': None
}
The LLM didn’t return an answer. It returned a tool call request. It’s telling us: “I need you to call get_temperature with the argument city='San Francisco'.”
Now it’s our job to:
- Execute the function
- Send the result back to the LLM
- Get the final answer
This is exactly what an agent does. Let’s build one.
Building the Agent Class#
Our agent needs three methods:
__init__: Initialize the agent with a client, system prompt, and tools__call__: Allow calling the agent directly with a messageexecute: The main loop that handles tool calls
import json
class Agent:
def __init__(self, client: InferenceClient, system: str = "", tools: list = None) -> None:
self.client = client
self.system = system
self.messages: list = []
self.tools = tools if tools is not None else []
if self.system:
self.messages.append({"role": "system", "content": system})
def __call__(self, message=""):
if message:
self.messages.append({"role": "user", "content": message})
final_assistant_content = self.execute()
if final_assistant_content:
self.messages.append({"role": "assistant", "content": final_assistant_content})
return final_assistant_content
def execute(self):
while True:
completion = self.client.chat.completions.create(
messages=self.messages,
tools=self.tools,
tool_choice="auto"
)
response_message = completion.choices[0].message
if response_message.tool_calls:
# Add the assistant's tool call message to history
self.messages.append(response_message)
tool_outputs = []
for tool_call in response_message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
# Execute the tool
if function_name in globals() and callable(globals()[function_name]):
function_to_call = globals()[function_name]
executed_output = function_to_call(**function_args)
tool_output_content = str(executed_output)
print(f"Executing tool: {function_name} with args {function_args}, Output: {tool_output_content[:500]}...")
tool_outputs.append({
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": tool_output_content,
})
# Add all tool results to message history
self.messages.extend(tool_outputs)
else:
# No more tool calls, return the final response
return response_message.content
Let’s break down what’s happening in execute():
Loop until we get a final answer: The
while Trueloop keeps running until the LLM returns a response without tool calls.Call the LLM: We send all messages in history plus the available tools.
Check for tool calls: If the response contains tool calls, we need to execute them.
Execute each tool: We loop through all tool calls (there can be multiple!), execute each one, and collect the outputs.
Add tool results to history: This is crucial. The tool responses must include the
tool_call_idthat matches the original request. If the IDs don’t match, the LLM will return an error.Repeat or return: If there were tool calls, we loop again with the updated history. If not, we return the final content.
About Message History#
Notice how we’re maintaining a messages list. This list contains more than what the user sees:
- System message
- User messages
- Assistant messages (including tool call requests)
- Tool response messages
In a typical chat interface, users only see their messages and the final assistant responses. But internally, the agent maintains the full history including tool calls and responses. This context is essential for the LLM to understand what happened.
Running the Agent#
Let’s put it all together:
agent = Agent(
client=client,
system="You are a helpful assistant that can answer questions using the provided tools.",
tools=[get_temperature_tool_schema]
)
response = agent("what is the weather in san francisco?")
print(response)
Output:
Executing tool: get_temperature with args {'city': 'San Francisco'}, Output: 72...
The current temperature in San Francisco is 72 degrees Fahrenheit.
It works! The agent:
- Received our question
- Decided to use the
get_temperaturetool - Executed the tool and got “72”
- Sent the result back to the LLM
- Returned a natural language answer
Inspecting Agent State#
You can inspect the agent’s internal state:
# View message history
agent.messages
[
{'role': 'system', 'content': 'You are a helpful assistant...'},
{'role': 'user', 'content': 'what is the weather in san francisco?'},
ChatCompletionOutputMessage(role='assistant', tool_calls=[...]),
{'tool_call_id': 'get_temperature:0', 'role': 'tool', 'name': 'get_temperature', 'content': '72'},
{'role': 'assistant', 'content': 'The current temperature in San Francisco is 72 degrees Fahrenheit.'}
]
You can see the full conversation flow, including the tool call and its response.
What’s Next?#
This is a minimal implementation to help you understand what’s happening behind the scenes. In production, you would want:
- Error handling: What if a tool fails? What if the API times out?
- Retry logic: Implement retries for failed tool calls
- Multiple tools: Our agent supports multiple tools, but we only used one
- Streaming: Stream responses for better UX
- Memory management: Truncate history to avoid context limits
- Logging: Track what the agent is doing for debugging
Agent frameworks like smolagents, LangChain, and LlamaIndex handle all of this for you. But now you understand what they’re doing under the hood.
In future tutorials, we’ll explore how to use these frameworks to build more robust agents. But you’ll now have a much better understanding of what’s actually happening behind the scenes.
Full Code#
Here’s the complete code for reference:
import os
import json
import getpass
from huggingface_hub import InferenceClient
# Setup
os.environ["HF_TOKEN"] = getpass.getpass("Hugging Face Token: ")
client = InferenceClient(model="moonshotai/Kimi-K2-Thinking")
# Tool definition
def get_temperature(city: str):
"""Get the current weather in a given city."""
if city.lower() == "san francisco":
return "72"
if city.lower() == "paris":
return "75"
if city.lower() == "tokyo":
return "73"
return "70"
# Tool schema
get_temperature_tool_schema = {
"type": "function",
"function": {
"name": "get_temperature",
"description": "Get the current temperature in a given city.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city to get the temperature for.",
}
},
"required": ["city"]
}
}
}
# Agent class
class Agent:
def __init__(self, client: InferenceClient, system: str = "", tools: list = None) -> None:
self.client = client
self.system = system
self.messages: list = []
self.tools = tools if tools is not None else []
if self.system:
self.messages.append({"role": "system", "content": system})
def __call__(self, message=""):
if message:
self.messages.append({"role": "user", "content": message})
final_assistant_content = self.execute()
if final_assistant_content:
self.messages.append({"role": "assistant", "content": final_assistant_content})
return final_assistant_content
def execute(self):
while True:
completion = self.client.chat.completions.create(
messages=self.messages,
tools=self.tools,
tool_choice="auto"
)
response_message = completion.choices[0].message
if response_message.tool_calls:
self.messages.append(response_message)
tool_outputs = []
for tool_call in response_message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
if function_name in globals() and callable(globals()[function_name]):
function_to_call = globals()[function_name]
executed_output = function_to_call(**function_args)
tool_output_content = str(executed_output)
print(f"Executing tool: {function_name} with args {function_args}, Output: {tool_output_content[:500]}...")
tool_outputs.append({
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": tool_output_content,
})
self.messages.extend(tool_outputs)
else:
return response_message.content
# Run the agent
agent = Agent(
client=client,
system="You are a helpful assistant that can answer questions using the provided tools.",
tools=[get_temperature_tool_schema]
)
response = agent("what is the temperature in San Francisco?")
print(response)
References#
- Hugging Face Inference Providers
- Kimi-K2-Thinking Model
- smolagents - A minimalist agent library from Hugging Face


