*wip* Add experimental AI functions `aiClassify/Extract/GenerateSQL/Translate/GenerateContent/GenerateEmbedding` by rschu1ze · Pull Request #100735 · ClickHouse/ClickHouse

rschu1ze · 2026-03-25T16:23:40Z

This is a re-push of #99579
Courtesy to @melvynator

Introduces experimental AI functions that call external AI/embedding providers (OpenAI, Anthropic) directly from SQL queries without leaving ClickHouse.

Uses Named Collections for storing provider credentials and query-level settings for operational parameters (like temperature and error handling).

All functions are gated behind the allow_experimental_ai_functions setting.

Changelog category (leave one):

Experimental Feature

Changelog entry (a user-readable short description):

Introduces experimental AI functions aiClassify, aiExtract, aiGenerateSQL, aiTranslate, aiGenerateContent, aiGenerateEmbedding, aiGenerateEmbeddingOrNull that call an external AI and embedding providers (OpenAI, Anthropic) directly from SQL queries without leaving ClickHouse.

…SQL, LLMTranslate Introduces four new scalar functions that call external LLM providers (OpenAI, Anthropic) directly from SQL queries, enabling AI-powered text classification, structured extraction, natural language to SQL generation, and translation without leaving ClickHouse. Key components: - Provider abstraction (ILLMProvider) with OpenAI and Anthropic implementations - Shared execution engine (LLMFunctionBase) with concurrent dispatch, request deduplication, SipHash128-based result caching, token-bucket rate limiting, exponential backoff retries, and per-query quota enforcement - 13 new llm_* query-level settings for fine-grained control - 9 LLM-specific ProfileEvents for observability - Input sanitization to handle control characters in raw text - LLMGenerateSQL resolves live database schema via DatabaseCatalog and strips markdown fences from output for direct copy-paste execution Configuration uses Named Collections for provider credentials and query-level settings for operational parameters, following the precedent set by the S3 integration.

…sting LLM code New functions: - LLMGenerateContent: general-purpose text generation via LLMFunctionBase - LLMGenerateEmbedding: standalone IFunction returning Array(Float32), with OpenAI embedding endpoint auto-derivation and binary cache serialization Embedding provider support: - Add LLMEmbeddingRequest/Response structs and ILLMProvider::embed() virtual - Implement OpenAIProvider::embed() with deriveEmbeddingURI() - Default embed() throws NOT_IMPLEMENTED for unsupported providers Bug fixes and hardening (from code review): - Fix cache key hash collisions by length-prefixing SipHash fields - Normalize Anthropic stop_reason to OpenAI finish_reason convention (max_tokens -> length, end_turn -> stop) so truncated responses skip cache - Handle NULL input rows in LLMFunctionBase and LLMGenerateEmbedding - Fix JSON injection in LLMClassify categories via Poco::JSON::Array - Use proper Poco::JSON::Parser in LLMClassify::postProcessResponse - Add null-safety checks in OpenAI/Anthropic response parsing (optValue) - Fix memory leak: raw Poco::JSON::Array* -> Poco::JSON::Array::Ptr - Align quota modes with PRD: on_quota_exceeded accepts "break"/"null", on_error accepts "null" (was checking non-existent "partial") Tests: - Add 03300_llm_functions stateless test covering argument validation, return type verification, and non-constant dimension rejection

…che reset - LLMGenerateEmbedding now batches multiple texts per HTTP request (configurable via max_batch_size in named collection, default 100) - Empty string inputs return empty array without calling the API - Null inputs return empty array gracefully - First argument can be an inline URL (http/https) bypassing named collections - HuggingFace TEI and huggingface recognized as provider aliases (OpenAI-compatible) - LLMResultCache gains reset() method for cache invalidation - OpenAI embed() parses response index field for correct ordering - Authorization header omitted when api_key is empty (for local models)

Embedding models are not LLMs, so the LLM prefix is misleading. The new name is more accurate and consistent with ClickHouse naming.

…perature handling, and Anthropic compatibility - Add generateEmbeddingOrNull function (returns empty array on errors instead of throwing) - Fix hasNamedCollectionArg to detect collection name as first argument via NamedCollectionFactory lookup - Fix resolveTemperature to extract temperature from last Float argument instead of always using default - Fix LLMGenerateContent to distinguish system_prompt (String) from temperature (Float) arguments - Rename LLM* ProfileEvents to AI* (AIInputTokens, AIAPICalls, etc.) - Add embedding_max_batch_size query-level setting - Improve Settings.cpp descriptions for all AI function settings - Add AI functions documentation page (ai-functions.md) - Update stateless tests for generateEmbeddingOrNull

AI functions now require SET allow_experimental_ai_functions = 1 before use, consistent with other experimental features (NLP, hash, funnel functions).

…xtPtr - Move AI settings from 26.3 to 26.4 in SettingsChangesHistory.cpp - Use WithContext instead of holding ContextPtr copy in LLMFunctionBase and generateEmbedding - Remove unused ErrorCodes, ProfileEvents, CurrentMetrics declarations across LLM files - Fix clang-tidy: const ref copies, range-based loop, empty catch comment, int64_t, static - Add AI function names to aspell dictionary - Add allow_experimental_ai_functions to enableAllExperimentalSettings.cpp - Fix "informations" typo in ai-functions.md

…es/database params - LLMGenerateSQL: add explicit arity check when first arg is a named collection so LLMGenerateSQL('existing_collection') fails with a deterministic error instead of OOB access - LLMGenerateSQL: remove unused tables/database parameters from signature (schema is auto-discovered from catalog), reducing max args from 5 to 3 - LLMGenerateContent: add same named-collection arity guard - Update docs and REGISTER_FUNCTION metadata to match

The various_checks.sh style check forbids using WithContext in src/Functions/. Use ContextWeakPtr + getContext() helper directly, which achieves the same safe weak-reference semantics without triggering the check.

…g check, quota values - Guard resp.embeddings.back() UB when provider returns empty response - Sanitize provider error messages to avoid leaking sensitive data (extract JSON error message or truncate to 256 chars) - Throw BAD_ARGUMENTS when first arg is non-constant in 3-arg generateEmbedding form instead of silently falling back to default - Align llm_on_quota_exceeded: remove undocumented break/partial values, only accept throw/null as documented in Settings.cpp

Fixes test 02415_all_new_functions_must_be_documented which requires all functions to have description, syntax, and introduced_in metadata.

- Add no-replicated-database tag to 03300_llm_functions test to prevent timeout in DatabaseReplicated mode - Remove duplicate LLM settings entries from 26.3 block (already in 26.4)

LLM functions require external HTTP access and named collection configuration that is not available in the unit test environment.

clickhouse-gh · 2026-03-25T16:24:25Z

Workflow [PR], commit [4a0b1ef]

Summary: ❌

job_name	test_name	status	info
Style check		failure
	aspell	failure	cidb
Docs check		dropped
Build (amd_debug)		dropped
Build (amd_asan_ubsan)		dropped
Build (amd_tsan)		dropped
Build (amd_msan)		dropped
Build (amd_binary)		dropped
Build (arm_asan_ubsan)		dropped
Build (arm_binary)		dropped
Build (amd_release)		dropped

AI Review

Summary

This PR adds experimental AI SQL functions and provider integration with caching/quota/rate-limit controls plus docs/tests. I found one additional actionable issue not already covered by existing bot comments: embedding function metadata still advertises old non-ai function names. Verdict: request a small follow-up fix before merge to keep function metadata consistent for users.

Findings

⚠️ Majors

[src/Functions/aiGenerateEmbedding.cpp:481,499] FunctionDocumentation .syntax and examples still use generateEmbedding / generateEmbeddingOrNull, while the registered SQL names are aiGenerateEmbedding / aiGenerateEmbeddingOrNull.
- Impact: user-facing metadata and docs/introspection can mislead users to call non-existent names.
- Suggested fix: update those documentation strings/examples to use aiGenerateEmbedding(...) and aiGenerateEmbeddingOrNull(...).

ClickHouse Rules

Item	Status	Notes
Deletion logging	➖
Serialization versioning	➖
Core-area scrutiny	✅
No test removal	✅
Experimental gate	✅
No magic constants	✅
Backward compatibility	✅
`SettingsChangesHistory.cpp`	✅
PR metadata quality	✅
Safe rollout	✅
Compilation time	✅

User-Lens

Function metadata naming mismatch (generateEmbedding* vs aiGenerateEmbedding*) is confusing for users relying on docs/introspection output.

Final Verdict

Status: ⚠️ Request changes
Minimum required action: align embedding FunctionDocumentation syntax/examples with actual ai* SQL function names.

clickhouse-gh · 2026-03-25T16:31:15Z

src/Functions/FunctionBaseAI.cpp

+    , retry_delay_ms(context->getSettingsRef()[Setting::llm_retry_initial_delay_ms])
+    , cache_ttl(context->getSettingsRef()[Setting::llm_cache_ttl_sec])
+    , default_llm_resource(context->getSettingsRef()[Setting::default_llm_resource])
+    , llm_max_rows_per_query(context->getSettingsRef()[Setting::llm_cache_ttl_sec])


❌ llm_max_rows_per_query is initialized from llm_cache_ttl_sec here, so row quota tracks TTL seconds instead of the configured row limit.

Impact: quota enforcement becomes incorrect (default 86400 instead of 100000, and user-set llm_max_rows_per_query is ignored).

Please initialize it from Setting::llm_max_rows_per_query.

clickhouse-gh · 2026-03-25T16:31:17Z

src/Functions/generateEmbedding.cpp

+        , llm_max_api_calls_per_query(context->getSettingsRef()[Setting::llm_max_api_calls_per_query])
+        , llm_on_error(context->getSettingsRef()[Setting::llm_on_error])
+        , llm_on_quota_exceeded(context->getSettingsRef()[Setting::llm_on_quota_exceeded])
+        , max_batch_size(context->getSettingsRef()[Setting::generateembedding_max_batch_size] ? 0 : DEFAULT_EMBED_BATCH_SIZE)


❌ max_batch_size is inverted in this ternary. When generateembedding_max_batch_size is non-zero, this sets max_batch_size to 0.

That makes for (size_t i = 0; i < to_dispatch.size(); i += max_batch_size) non-progressing (i += 0) and the query can hang indefinitely.

Please use the configured setting value when it is non-zero, otherwise DEFAULT_EMBED_BATCH_SIZE.

clickhouse-gh · 2026-03-25T16:31:17Z

src/Functions/FunctionBaseAI.cpp

+            std::lock_guard lock(results_mutex);
+            results[key] = cached->result;
+            ++cache_hits;
+            quota->rows_processed.fetch_add(rows.size(), std::memory_order_relaxed);


⚠️ Cache-hit rows bypass quota validation: this path increments rows_processed directly, but only dispatch path calls checkBeforeDispatch.

Impact: a query can exceed llm_max_rows_per_query if many rows are served from cache.

Please apply the same quota check for cached rows before accepting the cache hit (or centralize quota checks so both cached and dispatched paths share the same gate).

clickhouse-gh · 2026-03-25T16:52:45Z

src/Functions/aiClassify.cpp

+
+    DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
+    {
+        if (arguments.size() < 2 || arguments.size() > 4)


❌ With a named collection, this arity check is too permissive and allows a 2-argument call ([collection,] text, categories...) to pass analysis. In buildSystemPrompt we then access arguments[idx + 1], which becomes out-of-bounds when only collection,text are provided.

Please require at least 3 arguments when hasNamedCollectionArg(arguments) is true (or add an explicit guard before dereferencing idx + 1).

clickhouse-gh · 2026-03-25T16:52:46Z

src/Functions/aiExtract.cpp

+
+    DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
+    {
+        if (arguments.size() < 2 || arguments.size() > 4)


❌ Same issue here: when the first argument is a named collection, this check still allows only 2 arguments total. Later buildSystemPrompt / buildResponseFormatJSON read arguments[idx + 1], so [collection, text] can reach an out-of-bounds access.

Please enforce >= 3 arguments when hasNamedCollectionArg(arguments) is true.

clickhouse-gh · 2026-03-25T16:52:46Z

src/Functions/aiTranslate.cpp

+
+    DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
+    {
+        if (arguments.size() < 2 || arguments.size() > 5)


❌ For llmTranslate, a named-collection call requires at least collection, text, target_language. This check accepts 2 arguments, but buildSystemPrompt unconditionally reads arguments[idx + 1] (target_language), which can go out of bounds for [collection, text].

Please add the same named-collection arity guard used in llmGenerateSQL/llmGenerateContent.

clickhouse-gh · 2026-03-25T16:52:47Z

tests/queries/0_stateless/03300_llm_functions.sql

+SELECT '-- llmExtract: missing named collection';
+SELECT llmExtract('text', 'extract the name'); -- { serverError BAD_ARGUMENTS }
+
+-- llmExtract expects 2-5 args


⚠️ This section appears to duplicate llmExtract tests (llmExtract('hello world', 'French')) while llmTranslate argument validation is not tested at all.

Given the arity-sensitive logic around optional named collections, please add explicit negative tests for llmTranslate (too few / too many / missing named collection) and replace this duplicated block.

clickhouse-gh · 2026-03-25T19:42:03Z

src/Functions/aiGenerateEmbedding.cpp

+                            for (const auto * item : batch_items)
+                                req.inputs.push_back(item->text);
+
+                            auto resp = provider->embed(req, timeouts);


⚠️ ai_max_rps is ignored by embedding functions.

FunctionBaseAI throttles chat/completion calls, but this embedding path sends requests directly without a Throttler. This means ai_max_rps does not cap request rate for generateEmbedding/generateEmbeddingOrNull, so embedding-heavy queries can burst above the configured global AI request rate.

Please apply the same throttling mechanism here before each provider request.

clickhouse-gh · 2026-03-25T20:27:42Z

src/Functions/aiGenerateEmbedding.cpp

+                    "First argument of {} must be a constant string (named collection or URL) when 3 arguments are provided", name);
+            String first_arg(first_const->getDataAt(0));
+
+            if (!first_arg.empty() && looksLikeURL(first_arg))


generateEmbedding accepts an inline URL and assigns it directly to endpoint_val without any allowlist or access-control check. This lets any user who can run this experimental function make arbitrary outbound HTTP requests from the server process (SSRF), including to internal metadata services or private network endpoints.

Please enforce the same URL-safety policy used by other SQL-accessible HTTP paths (for example host allowlists / restricted schemes), or require named collections only.

clickhouse-gh · 2026-03-25T20:27:42Z

src/Functions/aiGenerateEmbedding.cpp

+        {
+            auto it = results.find(key);
+            const std::vector<Float32> * vec = nullptr;
+            if (it != results.end() && !it->second.empty())


For generateEmbedding (non-OrNull), malformed provider responses can silently produce wrong results instead of throwing.

If the provider omits embeddings for some inputs (or returns empty vectors), those rows are treated as skipped and emitted as empty arrays. This violates the documented "throws on API errors" behavior and can silently corrupt downstream similarity calculations.

Please validate that each requested input has a non-empty embedding (and ideally expected dimensionality), and throw RECEIVED_ERROR_FROM_REMOTE_IO_SERVER when this contract is violated (while keeping graceful behavior for generateEmbeddingOrNull).

clickhouse-gh · 2026-03-25T20:35:29Z

src/Functions/aiGenerateEmbedding.cpp

+            if (!first_arg.empty() && looksLikeURL(first_arg))
+            {
+                provider_name = "openai";
+                endpoint_val = first_arg;


generateEmbedding allows the 3-argument URL form ([collection_or_url,] text, dimensions), but this branch sets provider_name and endpoint_val only; model stays empty.

Later the request always sends ai_embedding_request.model = model, so URL-based calls submit an empty model and fail at runtime on OpenAI-compatible APIs.

Please either reject URL mode unless a model can be resolved, or resolve model from settings/named collection for this branch before dispatch.

clickhouse-gh · 2026-03-25T20:35:29Z

src/Core/Settings.cpp

    /* ####################################################### */ \
+    /* AI function settings (LLMClassify, LLMExtract, LLMGenerateSQL, LLMTranslate, LLMGenerateContent, generateEmbedding, generateEmbeddingOrNull) */ \
+    DECLARE(Bool, allow_experimental_ai_functions, false, R"(
+Enable experimental AI functions (LLMClassify, LLMExtract, LLMGenerateSQL, LLMTranslate, LLMGenerateContent, generateEmbedding, generateEmbeddingOrNull). These functions make external HTTP calls to LLM and embedding providers.


The setting documentation strings still reference legacy function names (LLMClassify, LLMExtract, LLMGenerateSQL, LLMTranslate, LLMGenerateContent) that are not the exposed SQL names in this PR (aiClassify, aiExtract, aiGenerateSQL, aiTranslate, aiGenerateContent).

Please update these descriptions to current function names to avoid confusing users.

clickhouse-gh · 2026-03-25T20:41:04Z

src/Functions/AI/AIQuotaTracker.cpp

+
+void AIQuotaTracker::recordResponse(UInt64 in_tokens, UInt64 out_tokens)
+{
+    input_tokens.fetch_add(in_tokens, std::memory_order_relaxed);


ai_max_input_tokens_per_query is not actually enforced for chat-style AI functions.

checkBeforeDispatch receives estimated_input_tokens = 0 from callers, and recordResponse only increments input_tokens without checking max_input_tokens.

Impact: queries can exceed the configured input-token budget silently, despite the setting/docs saying the quota is enforced per query.

Please enforce the limit when recording actual usage (e.g. in recordResponse) or pass a non-zero estimate to checkBeforeDispatch and reconcile with actual usage.

alexey-milovidov · 2026-03-25T20:57:08Z

tests/queries/0_stateless/03300_ai_functions.sql

+DROP TABLE IF EXISTS _03300_ret_embedding_or_null;
+
+-- =============================================================================
+-- 9. aiGenerateEmbedding: dimensions argument must be constant


This is weird. Dimensions number can't be a function argument.

alexey-milovidov · 2026-03-25T20:57:27Z

tests/queries/0_stateless/03300_ai_functions.sql

+-- aiGenerateEmbeddingOrNull returns Nullable(Array(Float32))
+DROP TABLE IF EXISTS _03300_ret_embedding_or_null;
+CREATE TABLE _03300_ret_embedding_or_null ENGINE = Memory AS
+    SELECT aiGenerateEmbeddingOrNull(x, 256) AS result FROM (SELECT 'hello' AS x WHERE 0);


What exactly does 'OrNull' mean?

alexey-milovidov · 2026-03-25T20:58:04Z

src/Functions/aiGenerateSQL.cpp

+    {
+        String schema = resolveSchema();
+        return "You are a ClickHouse SQL expert. Generate a valid ClickHouse SQL query.\n"
+               "Rules:\n"


It's weird to have the prompt hard-coded.
Please make it orthogonal or remove the function altogether.

https://github.com/apache/doris/pull/51725/changes#diff-0b36d9b633dcf67a515eded9e04db7c6d1aa70d4803ae80275ff4a792e65a7e6R27-R33

Not sure weird is the appropriate terminology. DuckDB extension does the same thing too.

But making the prompt an optional parameter (defaulting to this one) is ok.

alexey-milovidov · 2026-03-25T20:58:26Z

src/Functions/aiGenerateSQL.cpp

+
+        if (result.find("```sql") != String::npos)
+            strip_prefix(result, "```sql");
+        else if (result.find("```SQL") != String::npos)


Markdown injection.

alexey-milovidov · 2026-03-25T20:59:26Z

src/Functions/aiGenerateEmbedding.cpp

+                    "First argument of {} must be a constant string (named collection or URL) when 3 arguments are provided", name);
+            String first_arg(first_const->getDataAt(0));
+
+            if (!first_arg.empty() && looksLikeURL(first_arg))


And now we have an SSRF vulnerability.

alexey-milovidov · 2026-03-25T21:00:03Z

src/Functions/aiGenerateEmbedding.cpp

+        std::unordered_map<UInt128, std::vector<Float32>, UInt128Hash> results;
+        std::mutex results_mutex;
+
+        std::atomic<UInt64> total_api_calls{0};


There is no reason to make it atomic inside the loop.

alexey-milovidov · 2026-03-25T21:00:24Z

src/Functions/aiGenerateEmbedding.cpp

+                                throw;
+                            }
+                        }
+                        catch (...)


We don't allow this.

alexey-milovidov · 2026-03-25T21:01:07Z

src/Functions/aiGenerateEmbedding.cpp

+                });
+            }
+
+            pool->wait();


No. Functions shouldn't use thread pools.
Either it should be a separate component to encapsulate the parallelism and thread pools.

alexey-milovidov

Make it at least ten times simpler.

alexey-milovidov · 2026-03-25T21:03:20Z

This missing integration tests. Test at least with Gemma 3 270M.

clickhouse-gh · 2026-03-25T21:04:16Z

src/Functions/aiGenerateEmbedding.cpp

+        .description = "Generates embedding vectors for the given text using an embedding model. "
+                       "Supports batch API calls for efficient processing of multiple rows. "
+                       "Throws on API errors. Returns an empty array for NULL or empty inputs.",
+        .syntax = "generateEmbedding([collection_or_url,] text, dimensions)",


FunctionAiGenerateEmbedding is registered under SQL names aiGenerateEmbedding and aiGenerateEmbeddingOrNull, but FunctionDocumentation still uses generateEmbedding / generateEmbeddingOrNull in .syntax and examples.

This creates inconsistent user-facing metadata (for example in function introspection/docs) versus actual callable names.

Please update these documentation strings to the real SQL names used in this PR.

rschu1ze · 2026-03-25T21:16:15Z

You are right, this is just a working draft based on a (likely) vibecoded PR. Yes, the PR will be simplified and better tested.

george-larionov · 2026-03-26T16:10:10Z

I think we should split this PR up into a few parts. Below is how Claude proposed to split them up, I think it is a good start and I've linked the draft PRs below as well:

"infrastructure" which will contain the base function type and all the plumbing, along with a simple function to show it works
Add the rest of the ai text functions
Add the generateEmbedding funcs
(still working on them, I'll un-draft them when they are ready for a review)

Ubuntu and others added 19 commits March 25, 2026 16:20

Rename LLMGenerateEmbedding to generateEmbedding

cd25721

Embedding models are not LLMs, so the LLM prefix is misleading. The new name is more accurate and consistent with ClickHouse naming.

Add allow_experimental_ai_functions guard for all AI functions

94f023e

AI functions now require SET allow_experimental_ai_functions = 1 before use, consistent with other experimental features (NLP, hash, funnel functions).

Replace WithContext with ContextWeakPtr to satisfy style check

a8ab876

The various_checks.sh style check forbids using WithContext in src/Functions/. Use ContextWeakPtr + getContext() helper directly, which achieves the same safe weak-reference semantics without triggering the check.

Fix catch_all style check: use Ok comment instead of NOLINT

c495404

Fix clang-tidy: range-based for and bugprone-empty-catch in LLMClassify

ce8c3a7

Add NOLINT(bugprone-empty-catch) to suppress clang-tidy warnings

7430ebe

Add missing doc_type frontmatter to ai-functions.md

44ce42a

Fix broken anchor link to cosineDistance in ai-functions docs

88ef4da

Add introduced_in version to all AI function registrations

0e14d76

Fixes test 02415_all_new_functions_must_be_documented which requires all functions to have description, syntax, and introduced_in metadata.

Fix LLM test and settings history duplication

0a1bf23

- Add no-replicated-database tag to 03300_llm_functions test to prevent timeout in DatabaseReplicated mode - Remove duplicate LLM settings entries from 26.3 block (already in 26.4)

Exclude LLM functions from FunctionsStress unit test

032c066

LLM functions require external HTTP access and named collection configuration that is not available in the unit test environment.

Fix Throttler constructor call to include name parameter

1d1f943

rschu1ze changed the title ~~Feature/llm functions~~ Add experimental AI functions llmClassify/Extract/GenerateSQL/Translate/GenerateContent and generateEmbedding[OrNull] Mar 25, 2026

clickhouse-gh bot added the pr-experimental Experimental Feature label Mar 25, 2026

rschu1ze mentioned this pull request Mar 25, 2026

Add experimental AI functions llmClassify/Extract/GenerateSQL/Translate/GenerateContent and generateEmbedding[OrNull] #99579

Closed

1 task

clickhouse-gh bot reviewed Mar 25, 2026

View reviewed changes

More internal renamings

e1da590

rschu1ze changed the title ~~Add experimental AI functions llmClassify/Extract/GenerateSQL/Translate/GenerateContent and generateEmbedding[OrNull]~~ Add experimental AI functions aiClassify/Extract/GenerateSQL/Translate/GenerateContent/GenerateEmbedding Mar 25, 2026

Fix style

2d1bfe1

clickhouse-gh bot reviewed Mar 25, 2026

View reviewed changes

Correct function category

daddb5a

clickhouse-gh bot reviewed Mar 25, 2026

View reviewed changes

Update function names for the spell check (one more time)

083cdcf

clickhouse-gh bot reviewed Mar 25, 2026

View reviewed changes

Fix a test

4a0b1ef

alexey-milovidov reviewed Mar 25, 2026

View reviewed changes

src/Functions/aiGenerateEmbedding.cpp

throw;

}

}

catch (...)

Copy link
Copy Markdown

Member

alexey-milovidov Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't allow this.

alexey-milovidov reviewed Mar 25, 2026

View reviewed changes

alexey-milovidov requested changes Mar 25, 2026

View reviewed changes

clickhouse-gh bot reviewed Mar 25, 2026

View reviewed changes

rschu1ze changed the title ~~Add experimental AI functions aiClassify/Extract/GenerateSQL/Translate/GenerateContent/GenerateEmbedding~~ *wip* Add experimental AI functions aiClassify/Extract/GenerateSQL/Translate/GenerateContent/GenerateEmbedding Mar 25, 2026

This was referenced Mar 26, 2026

Add AI text functions: aiClassify, aiExtract, aiTranslate, aiGenerateSQL #100832

Draft

Add aiGenerateEmbedding and aiGenerateEmbeddingOrNull functions #100833

Draft

Add AI function infrastructure and aiGenerateContent #100831

Draft

rschu1ze closed this Mar 31, 2026

Conversation

rschu1ze commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description):

Uh oh!

clickhouse-gh bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Findings

ClickHouse Rules

User-Lens

Final Verdict

Uh oh!

clickhouse-gh bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov left a comment

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov commented Mar 25, 2026

Uh oh!

clickhouse-gh bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

rschu1ze commented Mar 25, 2026 •

edited

Loading

clickhouse-gh bot commented Mar 25, 2026 •

edited

Loading

george-larionov commented Mar 26, 2026 •

edited

Loading