I read the openAI documentation and browsed the web, researching the topic of embedding.
It’s definitely a very interesting topic and a very new way of using your own knowledge base with GPT, but within all the research I’ve done I haven’t found anything clear showing how to update those inserts.
In a company the data is changing, sometimes it changes in months and in other circumstances in seconds. I know OpenAI isn’t ready for real-time data updating yet, but would it be possible and maintainable to do so with embeds? Because sometimes it may be that the user only uploads a change due to a typographical error, because he wants to change one paragraph for another and so, imagine creating embeds for each time he uploads a change.
Another query that is not clear to me, aren’t you losing the context of your knowledge base when splitting the data?
Currently my use case is to query my knowledge base and also generate a response from any other topic. The problem is how to handle the first one, I have used different tools. I am currently using Cognitive Search for indexed search, but the problem is that the texts have up to more than 13000 characters, to that I add that I format the text and add words like {{IMG}} in the text that within my context the idea It would be something like this:
See: How to collect cash with the application?
Cognitive Search: Performs the search in the index, the index has the files in a container that returns a json where it has the text “Collection expires… {{IMG}}” and an array that has the images [“img1” , “img2”]
GPT: Receives the GPT response from the context “In the application, you can configure charging as follows…{{IMG1}}”
Text processing: With the GPT response, {{IMG}} is replaced by the image that corresponds to the index
Send response: response received
Of course, I can’t control what GPT responds to, because even if the text has {{IMG}}, there will be times when it ignores it and the token number becomes too long, increasing latency or response time. . . from openAI. I came to use Cognitive because I loved Microsoft’s proposal to combine both tools, but unfortunately I have not been able to use them as they do.
And I found some forums here talking about embedding but I don’t see that they give closure to the queries, it would be good to be able to answer all these questions, because in the end it is “new” technologies that we need to learn to apply correctly in our use cases, if we don’t know how to use them How to get the most out of them? And if they are open source it is because anyone can have access to the information, thank you very much for your answer.