{"id":4252,"date":"2025-02-13T09:34:15","date_gmt":"2025-02-13T17:34:15","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/azure-sql\/?p=4252"},"modified":"2025-04-01T13:22:32","modified_gmt":"2025-04-01T20:22:32","slug":"database-and-ai-solutions-for-keeping-embeddings-updated","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/azure-sql\/database-and-ai-solutions-for-keeping-embeddings-updated\/","title":{"rendered":"Database and AI: solutions for keeping embeddings updated"},"content":{"rendered":"<p>In the <a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/storing-querying-and-keeping-embeddings-updated-options-and-best-practices\/\">previous article of this series<\/a>, it was discussed how embeddings can be quickly created from data already in Azure SQL. This is a useful starting point, but since data in a database changes frequently, a common question arises: \u201cHow can the vectors be kept updated whenever there is a change to the content from which they have been generated?\u201d This article aims to address that question.\nSolutions to keep embeddings updated\nSeveral methods can be employed to ensure that embeddings are updated to reflect any modifications made to the content from which they are generated:<\/p>\n<ul>\n<li>Using a Database Trigger<\/li>\n<li>Using Change Tracking<\/li>\n<li>Using an Azure Function Sql Trigger binding<\/li>\n<li>Using Azure Logic Apps<\/li>\n<li>Using Change Data Capture<\/li>\n<li>Using the new Change Event Stream<\/li>\n<\/ul>\n<p>This is by far the simplest solution as it only requires T-SQL code and it take advantage of database trigger to call an embedding model for vectorizing the text.\nA after insert, update trigger is created on the table that contains the data that needs to be vectorized and internally the trigger uses <code>sp_invoke_external_rest_endpoint<\/code> to call to the embedding model and get the embedding vector back. To make the code cleaner and favor reuse, it is recommended to create a stored procedure to retrieve the embeddings, for example:<\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">create database scoped credential [https:\/\/&lt;your-azure-openai-endpoint&gt;.openai.azure.com]\r\nwith identity = 'HTTPEndpointHeaders', secret = '{\"api-key\": \"\"}'; -- Add your Azure OpenAI Key\r\ngo\r\ncreate or alter procedure [dbo].[get_embedding]\r\n@inputText nvarchar(max),\r\n@embedding vector(1536) output\r\nas\r\ndeclare @retval int;\r\ndeclare @payload nvarchar(max) = json_object('input': @inputText);\r\ndeclare @response nvarchar(max)\r\n\r\nexec @retval = sp_invoke_external_rest_endpoint\r\n@url = 'https:\/\/&lt;your-azure-openai-endpoint&gt;.openai.azure.com\/openai\/deployments\/&lt;deployment-name&gt;\/embeddings?api-version=2023-03-15-preview',\r\n@method = 'POST',\r\n@credential = [https:\/\/&lt;your-azure-openai-endpoint&gt;.openai.azure.com],\r\n@payload = @payload,\r\n@response = @response output;\r\n\r\nset @embedding = json_query(@response, '$.result.data[0].embedding');\r\n\r\nreturn @retval\r\n<\/code><\/pre>\n<p>And then use it in the trigger:<\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">declare @content_vector vector(1536); \r\nexec @retval = [dbo].[get_embedding] @content, @content_vector output with result sets none<\/code><\/pre>\n<p>It is important to understand that the trigger is part of the transaction that modifies the data in the table, which means that the performance of the transaction will be proportional to the performance of the embedding generation. If the embedding generation takes considerable time (for instance, if requests are being throttled by the called model), all the locks and resources used by the transaction will be held for a longer period, potentially resulting in reduced concurrency.<\/p>\n<p>Therefore, this approach is recommended primarily for proof-of-concepts or scenarios where concurrent access to the table is very limited. In SQL Server, it is recommended to use read committed snapshot to minimize read-write contention. On Azure SQL, this setting is enabled by default, so no additional action is required.<\/p>\n<p>You can find a sample showing trigger usage here: <a href=\"https:\/\/github.com\/Azure-Samples\/azure-sql-db-vector-search\/tree\/main\/Embeddings\/T-SQL\">Vector Search Samples with T-SQL<\/a><\/p>\n<h2>Using Change Tracking<\/h2>\n<p>Change tracking is a powerful yet underutilized technology. It allows querying a table to retrieve all changes since the last query or from a specific version, making it ideal for detecting inserted or updated rows and updating embeddings accordingly. Lightweight and efficient, change tracking adds minimal overhead, perfect for such scenarios. Learn more about change tracking here:<\/p>\n<p><a href=\"https:\/\/learn.microsoft.com\/en-us\/sql\/relational-databases\/track-changes\/about-change-tracking-sql-server?view=sql-server-ver16\">SQL Server Change Tracking<\/a><\/p>\n<p>With change tracking, you can create a micro-batching solution. This could be a scheduled application or a simple T-SQL scheduled batch that will update the desired tables of changes every second. It will connect to the chosen embedding model only when necessary, generate the embedding, and then update the table with the generated values.<\/p>\n<h2>Using an Azure Function Sql Trigger binding<\/h2>\n<p>Change tracking is a viable option to keep embeddings updated, but it requires some coding to establish a functional end-to-end solution. However, coding to manage change tracking is not necessary if Azure Functions are used. Azure Functions offer the option, through the SQL Trigger binding, to respond to any changes in monitored tables, allowing the execution of code \u2013 whether in C#, Python, JavaScript, or Java \u2013 whenever rows are inserted, updated, or deleted.<\/p>\n<p><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/azure-functions\/functions-bindings-azure-sql-trigger?tabs=isolated-process%2Cpython-v2%2Cportal&amp;pivots=programming-language-csharp\">Azure Function Bindings &#8211; Azure SQL Trigger<\/a><\/p>\n<p>This Azure Function binding utilizes Change Tracking to provide benefits without needing additional infrastructure or plumbing code. You only need to write the logic to connect to the desired embedding model to generate the embedding and update the inserted or changed rows.<\/p>\n<p>A working example of this approach is available here: <a href=\"https:\/\/github.com\/Azure-Samples\/azure-sql-db-session-recommender-v2\">https:\/\/github.com\/Azure-Samples\/azure-sql-db-session-recommender-v2<\/a><\/p>\n<h2>Using Azure Logic Apps<\/h2>\n<p>Azure Logic Apps simplify the process by eliminating the need for coding. The tasks that can be performed using Change Tracking and Azure Function SQL Trigger binding can also be done in Azure Logic Apps through the workflow editor, where you can design the process of monitoring a table for changes, generating the embedding, and storing them back into the database.\nMore information is available here:<\/p>\n<p><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/connectors\/connectors-create-api-sqlazure?tabs=consumption#add-a-sql-server-trigger\">Azure Logic Apps Connectors &#8211; Add a SQL Server Trigger<\/a><\/p>\n<p>A fully working example is available here:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/sql-vector-store-data-ingestion-and-retrieval-using-azure-logic-apps\/\">RAG with SQL Vector Store: A Low-Code\/No-Code Approach using Azure Logic Apps<\/a><\/p>\n<h2>Using Change Data Capture<\/h2>\n<p>Change Data Capture is a technology available in the MSSQL Engine that allows developers to monitor a table for changes. It is more comprehensive and powerful than Change Tracking as it automatically stores all the changes made to the monitored table into a dedicated table, preserving the entire history.<\/p>\n<p>When used with tools like Debezium, it enables the creation of Change Stream solutions where data can be sent to Apache Kafka or Azure Event Hubs for further processing: <a href=\"https:\/\/learn.microsoft.com\/en-us\/samples\/azure-samples\/azure-sql-db-change-stream-debezium\/azure-sql--sql-server-change-stream-with-debezium\/\">Azure SQL and SQL Server Change Stream with Debezium<\/a>.<\/p>\n<p>Once the changed data is available in Apache Kafka or Azure Event Hubs, it is straightforward to use a custom application or an Azure Function to consume the change feed, generate the related embeddings, and then save them back to the database.<\/p>\n<h2>Using the new Change Event Stream<\/h2>\n<p>The newly announced <a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/introducing-change-event-streaming-join-the-azure-sql-database-private-preview-for-change-data-streaming\/\">Change Event Stream<\/a> automates the process based on the concept of Change Data Capture. Once the data is available in Azure Event Hubs, executing an Azure Function to generate embeddings for the changed data and then saving it back to the database requires only a few lines of code.<\/p>\n<h2>Conclusion<\/h2>\n<p>To ensure embeddings in Azure SQL remain up-to-date, various methods such as database triggers, change tracking, and Azure Functions can be employed. Each approach has its own benefits, from the simplicity of database triggers for proof-of-concept scenarios to the more robust capabilities of Change Data Capture and Azure Logic Apps for handling extensive data changes. By selecting the most suitable method based on specific requirements, organizations can maintain the accuracy and relevance of their embeddings, enhancing their AI-driven applications.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the previous article of this series, it was discussed how embeddings can be quickly created from data already in Azure SQL. This is a useful starting point, but since data in a database changes frequently, a common question arises: \u201cHow can the vectors be kept updated whenever there is a change to the content [&hellip;]<\/p>\n","protected":false},"author":24720,"featured_media":4267,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[601,1],"tags":[640,647],"class_list":["post-4252","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-azure-sql","tag-embeddings","tag-vectors"],"acf":[],"blog_post_summary":"<p>In the previous article of this series, it was discussed how embeddings can be quickly created from data already in Azure SQL. This is a useful starting point, but since data in a database changes frequently, a common question arises: \u201cHow can the vectors be kept updated whenever there is a change to the content [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts\/4252","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/users\/24720"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/comments?post=4252"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts\/4252\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/media\/4267"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/media?parent=4252"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/categories?post=4252"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/tags?post=4252"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}