Implementing RAG via Custom Functions in OpenAI Assistants

abc01 · October 18, 2024, 8:25pm

Hi everyone,

I’m exploring the possibility of implementing Retrieval-Augmented Generation (RAG) directly using the “custom functions” feature in OpenAI Assistants.

Specifically, given a vector storage database (such as Pinecone or a similar service), I’m wondering if it’s feasible to integrate the RAG context search functionality directly within the Assistant through custom functions.

For example, if I provide the database URL, API key, index name, and top_k, I would expect the Assistant to retrieve the relevant context from the database before generating a response.

On top of other benefits, this approach could potentially improve latency by reducing the overhead of external API calls.

Has anyone tried this, or do you think this approach is feasible?

Thank you in advance for any suggestions!

sergeliatko · October 18, 2024, 8:40pm

Hi @abc01,

Yes, totally doable.

Depending on the complexity of the data, a good starting point would be a combo of postgresql for relational database + weaviate for vector management and vector search + Directus to manage the both and crud API along with custom webhooks for extra features, all behind a Traefik proxy.

What would be the app you’re building?

abc01 · October 18, 2024, 8:52pm

Thank you @sergeliatko,

Assuming I make use of pinecone, wouldnt be ‘simply’ a matter of writing a custom function that essentially combine the following three?

def query_pinecone(query_text, top_k=5):
#Convert query to embedding
query_embedding = get_embedding(query_text)

query Pinecone
result = index.query(queries=[query_embedding], top_k=top_k)
return result
def get_relevant_context(query):
result = query_pinecone(query)
relevant_text = [match[‘metadata’][‘text’] for match in result[‘matches’]]
return " ".join(relevant_text)
def generate_answer_with_context(query):
context = get_relevant_context(query)
prompt = f"Context: {context}\n\nUser Query: {query}\nAnswer:"
response = openai.Completion.create(
engine=“gpt-3.5-turbo”,
prompt=prompt,
)
return response.choices[0].text.strip()

Thank you

sergeliatko · October 19, 2024, 7:04am

Sure you would be ready to start like this as well. Personally I prefer something ready to be scaled if your project takes off. But then, if your goal was to test a concept, chose whatever is easier and faster.

abc01 · October 21, 2024, 2:04pm

Thank you for the feedback. I realize my initial question might not have been clear. What I’m specifically asking is not about the choice of vector management solutions (e.g., Weaviate, Pinecone, Elasticsearch) or the external systems involved. My focus is whether it’s possible to implement the search and retrieval of context directly within OpenAI’s Agent using the ‘Functions’ or ‘Code Interpreter’ features. The goal is to minimize external API calls and reduce latency. Or do you believe this process (the search and retrieval of context) must necessarily be initiated from outside the AI Assistant?

If you believe this integration is possible within the Agent, should it be done via the ‘Code Interpreter’ or ‘Functions’? Or do you think the search and retrieval process must necessarily be initiated externally and then, the retirved context, feeded into the AI Assistant?

sergeliatko · October 21, 2024, 11:28pm

Personally, I don’t use assistants as I don’t really see benefits of giving AI the control of what context is used to form the response my apps need. But that’s a personal choice + specifics of what I’m doing usually.

Assistants are cool when you don’t want to deal with thread and context management. The price for skipping that bootstrap is:

It’s assistant who picks the most relevant messages from the current thread to answer the user…

I prefer using tools I mentioned (relatively easy to work with) and chat endpoints to have stateless tools and full control over the context.

It’s totally doable to use function calling to allow assistants access the context they judge necessary. But have you evaluated their judgement and the tradeoffs of such approach? If it’s you who is going to handle the retrieval and context, are you sure you need assistants?

sergeliatko · October 21, 2024, 11:39pm

As for the screenshot, when I need to add features to assistants (like custom GPTs for my wife to handle her website sales and reservations stats + attendee seats info with pre processing in Google sheets) I prefer exposing custom API definitions (basically same thing as function calling but the GPT bot handles the API request for you) with all features they might need and good set of instructions/workflow descriptions inside the knowledge files when they don’t fit into bot instructions limit. Works cool. Here is an example:

TomNextApp · April 22, 2025, 7:33pm

It’s great reading you here, @sergeliatko . I am collecting threads to follow later.

A client wants to do a POC for an LLM based bot that also knows it’s data, which consists both of structured data - such as business listings, and unstructured data - specialized articles.
I was thinking as a POC to create an assistant and upload the material to its vector store. However, the business listing does get updated automatically from another platform, so it would need to update the store.

What do you think? What would be the least “painful” way to achieve this?

sergeliatko · April 22, 2025, 10:13pm

Hi @TomNextApp ,

Nice project. Looks interesting.

I would approach it by designing my relational DB to store all the data on my end, then design the object structures/entities to power the vectors in RAG (for articles and maybe some structured data to allow loose search). Then add procedures to update my vectors on entity updates in relational DB. Then wrap everything in rest API and put MCP in place. Then connect pretty much anything you need via the API endpoint or MCP.

Stack:

Docker container with:
Load balancer (Traefik)
Postgresql db (spacial?)
Redis (cache for #4)
Directus (headless CMS, db API, flows = n8n, auth, email, etc)
Weaviate

Not sure about how to do MCP (never yet tried), but I’m sure there are a lot of tools to build it automatically from open API specs provided by Directus, or you can make it via your own endpoints (again in Directus).

That would be probably the easiest way to have something up and running with a huge margin to scale if ever needed, or simply use github/Directus templates to clone apps as you need for small clients.

sergeliatko · April 22, 2025, 10:20pm

@TomNextApp BTW, welcome to the forum and feel free to reach out to me if you need more details

bsamuele · May 10, 2025, 9:43am

Hello, I have looked around the docs but couldn’t find anything about this. Is this some custom feature you implemented? Or is there a way to actually set up an assistant to call an external API for you, without client code to handle function calls?

Topic		Replies	Views
Designing a Custom Chatbot with RAG and Function Calling GPT builders	4	2170	January 22, 2025
Assistants and function/API API assistants , assistants-api	13	3731	December 12, 2023
Function calling in Chat Completions API vs Assistants API API chat-completion , function-calling , assistants-api	6	8674	January 18, 2024
Assistants API is Killing Me API api , api-billing , assistants , assistants-api , cost	38	3631	February 13, 2024
Assistants - Embeddings and Vector Stores API embeddings , vector-db , assistants-api	15	13056	July 24, 2024

Implementing RAG via Custom Functions in OpenAI Assistants

Related topics