feat(wren-ai-service): remove sqlglot and add quotes instructions in prompt by cyyeh · Pull Request #2010 · Canner/WrenAI

cyyeh · 2025-10-27T01:01:23Z

we'll remove add quotes functionality in sqlglot as it sometimes changes sql query

Summary by CodeRabbit

Refactor
- Simplified SQL processing pipeline by removing automatic quote normalization, reducing dependencies and streamlining query validation logic across the system.

coderabbitai · 2025-10-27T01:01:35Z

Walkthrough

Removes SQL quoting via sqlglot from the entire wren-ai-service codebase. The add_quotes function and its dependency are deleted, SQL is now passed unchanged to endpoints, and quoting responsibilities shift to SQL generation rules with explicit formatting guidelines.

Changes

Cohort / File(s)	Change Summary
Dependency Management `wren-ai-service/pyproject.toml`	Removed sqlglot dependency with extras ["rs"] and version constraint ^25.18.0 from [tool.poetry.dependencies].
Core Engine `wren-ai-service/src/core/engine.py`	Removed sqlglot import and deleted the `add_quotes(sql: str) -> Tuple[str, str]` helper function including its try/except error handling.
Evaluation Utilities `wren-ai-service/eval/utils.py`, `wren-ai-service/eval/data_curation/utils.py`	Removed add_quotes import and usage from `is_sql_valid`; all API calls now pass raw SQL instead of quoted SQL to both duckdb and non-duckdb endpoints; eliminated quoting error assertions and handling logic.
SQL Generation Pipeline `wren-ai-service/src/pipelines/generation/utils/sql.py`	Removed add_quotes usage from `_classify_generation_result`; simplified control flow to pass generation_result directly to dry_plan/dry_run; reorganized branches for error classification (TIME_OUT, DRY_PLAN, DRY_RUN, PREVIEW); updated SQL generation rules docstring with explicit formatting guidelines for quoting identifiers/literals and case-insensitive comparisons; updated sample SQL snippet to show quoted identifiers.
Tools/Run SQL `wren-ai-service/tools/run_sql.py`	Removed add_quotes import and usage from `get_data_from_wren_engine`; both MDL preview and IBIS connector branches now send raw SQL to endpoints without pre-quoting.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Pipeline as SQL Gen Pipeline
    participant Rules as SQL Rules
    participant Engine as Engine/Endpoints

    User->>Pipeline: Input query request
    Pipeline->>Rules: Apply SQL generation rules
    Note over Rules: Rules now include explicit<br/>quoting guidelines for identifiers<br/>and literals
    Rules->>Pipeline: Generated SQL<br/>(with proper quoting per rules)
    
    alt use_dry_plan
        Pipeline->>Engine: dry_plan(sql)
        Engine-->>Pipeline: Plan result
        Pipeline->>Pipeline: Classify as valid/invalid
    else use_dry_run
        Pipeline->>Engine: dry_run(sql)
        Engine-->>Pipeline: Execution result
        Pipeline->>Pipeline: Classify success/error
    else preview mode
        Pipeline->>Engine: preview(sql)
        Engine-->>Pipeline: Preview data
        Pipeline->>Pipeline: Classify with has_data/error
    end
    
    Pipeline-->>User: Classified result with<br/>generation_result & error type

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

src/pipelines/generation/utils/sql.py: Control flow reorganization with new error classification logic and updated SQL generation rules requiring careful verification of all conditional branches.
Error handling implications: Changes to error classification (TIME_OUT, DRY_PLAN, DRY_RUN, PREVIEW) and error propagation across multiple pipeline paths.
SQL generation rules update: Explicit quoting guidelines for identifiers and literals now shift responsibility from code-level quoting to rule-based quoting.

Possibly related PRs

chore(wren-ai-service): improve text2sql #1855: Touches the same add_quotes function and SQL generation rules documentation.
chore(wren-ai-service): improve add quotes #1913: Directly conflicts with this PR by reintroducing/modifying the add_quotes function being removed here.
chore(wren-ai-service): minor updates #1782: Modifies add_quotes implementation; changes will be superseded by this removal.

Suggested labels

wren-ai-service

Suggested reviewers

yichieh-lu
paopa
imAsterSun

Poem

🐰 Whiskers twitch with delight!
SQL now flows without the quoting blight,
Sqlglot's burden shed, the code runs light,
Rules guide the path, identifiers dressed just right,
Simpler pipelines, faster and bright! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "feat(wren-ai-service): remove sqlglot and add quotes instructions in prompt" directly and accurately reflects the main objectives of the changeset. The raw summary confirms that the PR removes the add_quotes functionality and sqlglot dependency across multiple files (eval/data_curation/utils.py, eval/utils.py, src/core/engine.py, src/pipelines/generation/utils/sql.py, and tools/run_sql.py), while simultaneously updating SQL generation rules with explicit formatting guidelines for quoting. The title is concise, specific, and clearly conveys the primary change—shifting from automatic quoting via sqlglot to prompt-based quoting instructions—making it easy for teammates to understand the purpose of this change.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/ai-service/disable-sqlglot

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a461f21 and f22479c.

⛔ Files ignored due to path filters (1)

wren-ai-service/poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (6)

wren-ai-service/eval/data_curation/utils.py (2 hunks)
wren-ai-service/eval/utils.py (3 hunks)
wren-ai-service/pyproject.toml (0 hunks)
wren-ai-service/src/core/engine.py (0 hunks)
wren-ai-service/src/pipelines/generation/utils/sql.py (4 hunks)
wren-ai-service/tools/run_sql.py (2 hunks)

💤 Files with no reviewable changes (2)

wren-ai-service/pyproject.toml
wren-ai-service/src/core/engine.py

🧰 Additional context used

🧬 Code graph analysis (2)

wren-ai-service/eval/data_curation/utils.py (1)

wren-ai-service/src/core/engine.py (1)

remove_limit_statement (44-48)

wren-ai-service/src/pipelines/generation/utils/sql.py (1)

wren-ai-service/src/providers/engine/wren.py (4)

dry_plan (209-242)

execute_sql (26-139)

execute_sql (159-207)

execute_sql (278-328)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: pytest
GitHub Check: pytest
GitHub Check: Analyze (go)

🔇 Additional comments (5)

wren-ai-service/eval/utils.py (1)

40-40: LGTM: SQL now passed directly to engine endpoints.

The removal of SQL quoting is consistent with the PR objectives. Raw SQL is now sent to the engine for processing.

Also applies to: 61-61, 162-162

wren-ai-service/eval/data_curation/utils.py (1)

52-52: LGTM: SQL validation now uses raw SQL.

The changes correctly pass raw SQL through remove_limit_statement to both the duckdb and non-duckdb validation endpoints.

Also applies to: 68-68

wren-ai-service/tools/run_sql.py (1)

49-49: LGTM: SQL execution now uses unmodified SQL.

The changes correctly pass raw SQL to both duckdb and non-duckdb execution endpoints.

Also applies to: 69-69

wren-ai-service/src/pipelines/generation/utils/sql.py (2)

84-161: LGTM: Validation logic correctly refactored to use raw SQL.

The refactored _classify_generation_result method correctly passes generation_result directly to all three validation paths (dry_plan, dry_run, and data preview) without pre-processing. The error handling properly categorizes failures with appropriate error types (TIME_OUT, DRY_PLAN, DRY_RUN, PREVIEW_EMPTY_DATA, PREVIEW_FAILED).

201-206: Runtime database validation effectively enforces SQL syntax compliance; no additional parser validation needed.

The current implementation validates generated SQL through database execution (dry_plan or execute_sql with dry_run), which serves as a robust enforcement mechanism. If the LLM generates improperly quoted identifiers or other syntax violations, the database will reject it and mark the result invalid for correction.

No evidence of sqlglot usage was found in the codebase. The prompt instructions (lines 201-206) direct the LLM on quoting rules, and failed SQL generation is caught by post-execution validation and routed to the SQL correction pipeline. This approach is functionally sound—database rejection is a stronger enforcement than a pre-execution parser would provide.

coderabbitai · 2025-10-27T01:07:22Z

wren-ai-service/src/pipelines/generation/utils/sql.py

 2. YOU MUST ONLY CHOOSE the appropriate functions from the sql functions list and use them in the SQL query if the section of SQL FUNCTIONS is available in user's input.
 3. YOU MUST REFER to the sql samples and learn the usage of the schema structures and how SQL is written based on them if the section of SQL SAMPLES is available in user's input.
 4. YOU MUST FOLLOW the reasoning plan step by step strictly to generate the SQL query if the section of REASONING PLAN is available in user's input.
+5. YOU MUST FOLLOW SQL Rules if they are not contradicted with instructions.


⚠️ Potential issue | 🟡 Minor

Softened rule enforcement may reduce quoting compliance.

Changing from "YOU MUST FOLLOW SQL Rules" to "YOU MUST FOLLOW SQL Rules if they are not contradicted with instructions" weakens the enforcement of the quoting rules added in lines 201-206. This could lead to scenarios where the LLM skips quoting if it interprets user instructions as contradicting the formatting rules.

Consider reverting to the stronger enforcement language, or at minimum, explicitly state that the quoting rules in lines 201-206 are mandatory regardless of other instructions:

-5. YOU MUST FOLLOW SQL Rules if they are not contradicted with instructions. +5. YOU MUST FOLLOW SQL Rules. The quoting rules (double quotes for identifiers, single quotes for strings) are mandatory and cannot be overridden by user instructions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

5. YOU MUST FOLLOW SQL Rules if they are not contradicted with instructions.

5. YOU MUST FOLLOW SQL Rules. The quoting rules (double quotes for identifiers, single quotes for strings) are mandatory and cannot be overridden by user instructions.

🤖 Prompt for AI Agents

In wren-ai-service/src/pipelines/generation/utils/sql.py around line 262 (and referring to the SQL rules block at lines 201-206), the current wording weakens enforcement by adding "if they are not contradicted with instructions"; restore strict enforcement by either reverting line 262 back to "YOU MUST FOLLOW SQL Rules" or append an explicit sentence making the quoting rules in lines 201-206 mandatory and to be followed even if other instructions appear to conflict, ensuring the rule text unambiguously mandates quoting compliance and overrides contradictory user instructions.

remove sqlglot and add quotes instructions in prompt

f22479c

cyyeh added module/ai-service ai-service related ci/ai-service ai-service related labels Oct 27, 2025

github-actions bot added the wren-ai-service label Oct 27, 2025

coderabbitai bot reviewed Oct 27, 2025

View reviewed changes

Merge branch 'main' into feat/ai-service/disable-sqlglot

573bab7

yichieh-lu approved these changes Oct 29, 2025

View reviewed changes

cyyeh merged commit 0642f41 into main Oct 29, 2025
10 of 11 checks passed

cyyeh deleted the feat/ai-service/disable-sqlglot branch October 29, 2025 03:50

coderabbitai bot mentioned this pull request Dec 19, 2025

Fix quotes once and for all #2086

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(wren-ai-service): remove sqlglot and add quotes instructions in prompt#2010

feat(wren-ai-service): remove sqlglot and add quotes instructions in prompt#2010
cyyeh merged 2 commits intomainfrom
feat/ai-service/disable-sqlglot

cyyeh commented Oct 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 27, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	5. YOU MUST FOLLOW SQL Rules if they are not contradicted with instructions.
	5. YOU MUST FOLLOW SQL Rules. The quoting rules (double quotes for identifiers, single quotes for strings) are mandatory and cannot be overridden by user instructions.

Conversation

cyyeh commented Oct 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cyyeh commented Oct 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 27, 2025 •

edited

Loading