Skip to content

feat(wren-ai-service): remove sqlglot and add quotes instructions in prompt#2010

Merged
cyyeh merged 2 commits intomainfrom
feat/ai-service/disable-sqlglot
Oct 29, 2025
Merged

feat(wren-ai-service): remove sqlglot and add quotes instructions in prompt#2010
cyyeh merged 2 commits intomainfrom
feat/ai-service/disable-sqlglot

Conversation

@cyyeh
Copy link
Copy Markdown
Member

@cyyeh cyyeh commented Oct 27, 2025

we'll remove add quotes functionality in sqlglot as it sometimes changes sql query

Summary by CodeRabbit

  • Refactor
    • Simplified SQL processing pipeline by removing automatic quote normalization, reducing dependencies and streamlining query validation logic across the system.

@cyyeh cyyeh added module/ai-service ai-service related ci/ai-service ai-service related labels Oct 27, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Oct 27, 2025

Walkthrough

Removes SQL quoting via sqlglot from the entire wren-ai-service codebase. The add_quotes function and its dependency are deleted, SQL is now passed unchanged to endpoints, and quoting responsibilities shift to SQL generation rules with explicit formatting guidelines.

Changes

Cohort / File(s) Change Summary
Dependency Management
wren-ai-service/pyproject.toml
Removed sqlglot dependency with extras ["rs"] and version constraint ^25.18.0 from [tool.poetry.dependencies].
Core Engine
wren-ai-service/src/core/engine.py
Removed sqlglot import and deleted the add_quotes(sql: str) -> Tuple[str, str] helper function including its try/except error handling.
Evaluation Utilities
wren-ai-service/eval/utils.py, wren-ai-service/eval/data_curation/utils.py
Removed add_quotes import and usage from is_sql_valid; all API calls now pass raw SQL instead of quoted SQL to both duckdb and non-duckdb endpoints; eliminated quoting error assertions and handling logic.
SQL Generation Pipeline
wren-ai-service/src/pipelines/generation/utils/sql.py
Removed add_quotes usage from _classify_generation_result; simplified control flow to pass generation_result directly to dry_plan/dry_run; reorganized branches for error classification (TIME_OUT, DRY_PLAN, DRY_RUN, PREVIEW); updated SQL generation rules docstring with explicit formatting guidelines for quoting identifiers/literals and case-insensitive comparisons; updated sample SQL snippet to show quoted identifiers.
Tools/Run SQL
wren-ai-service/tools/run_sql.py
Removed add_quotes import and usage from get_data_from_wren_engine; both MDL preview and IBIS connector branches now send raw SQL to endpoints without pre-quoting.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Pipeline as SQL Gen Pipeline
    participant Rules as SQL Rules
    participant Engine as Engine/Endpoints

    User->>Pipeline: Input query request
    Pipeline->>Rules: Apply SQL generation rules
    Note over Rules: Rules now include explicit<br/>quoting guidelines for identifiers<br/>and literals
    Rules->>Pipeline: Generated SQL<br/>(with proper quoting per rules)
    
    alt use_dry_plan
        Pipeline->>Engine: dry_plan(sql)
        Engine-->>Pipeline: Plan result
        Pipeline->>Pipeline: Classify as valid/invalid
    else use_dry_run
        Pipeline->>Engine: dry_run(sql)
        Engine-->>Pipeline: Execution result
        Pipeline->>Pipeline: Classify success/error
    else preview mode
        Pipeline->>Engine: preview(sql)
        Engine-->>Pipeline: Preview data
        Pipeline->>Pipeline: Classify with has_data/error
    end
    
    Pipeline-->>User: Classified result with<br/>generation_result & error type
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • src/pipelines/generation/utils/sql.py: Control flow reorganization with new error classification logic and updated SQL generation rules requiring careful verification of all conditional branches.
  • Error handling implications: Changes to error classification (TIME_OUT, DRY_PLAN, DRY_RUN, PREVIEW) and error propagation across multiple pipeline paths.
  • SQL generation rules update: Explicit quoting guidelines for identifiers and literals now shift responsibility from code-level quoting to rule-based quoting.

Possibly related PRs

Suggested labels

wren-ai-service

Suggested reviewers

  • yichieh-lu
  • paopa
  • imAsterSun

Poem

🐰 Whiskers twitch with delight!
SQL now flows without the quoting blight,
Sqlglot's burden shed, the code runs light,
Rules guide the path, identifiers dressed just right,
Simpler pipelines, faster and bright! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "feat(wren-ai-service): remove sqlglot and add quotes instructions in prompt" directly and accurately reflects the main objectives of the changeset. The raw summary confirms that the PR removes the add_quotes functionality and sqlglot dependency across multiple files (eval/data_curation/utils.py, eval/utils.py, src/core/engine.py, src/pipelines/generation/utils/sql.py, and tools/run_sql.py), while simultaneously updating SQL generation rules with explicit formatting guidelines for quoting. The title is concise, specific, and clearly conveys the primary change—shifting from automatic quoting via sqlglot to prompt-based quoting instructions—making it easy for teammates to understand the purpose of this change.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/ai-service/disable-sqlglot

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a461f21 and f22479c.

⛔ Files ignored due to path filters (1)
  • wren-ai-service/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • wren-ai-service/eval/data_curation/utils.py (2 hunks)
  • wren-ai-service/eval/utils.py (3 hunks)
  • wren-ai-service/pyproject.toml (0 hunks)
  • wren-ai-service/src/core/engine.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/utils/sql.py (4 hunks)
  • wren-ai-service/tools/run_sql.py (2 hunks)
💤 Files with no reviewable changes (2)
  • wren-ai-service/pyproject.toml
  • wren-ai-service/src/core/engine.py
🧰 Additional context used
🧬 Code graph analysis (2)
wren-ai-service/eval/data_curation/utils.py (1)
wren-ai-service/src/core/engine.py (1)
  • remove_limit_statement (44-48)
wren-ai-service/src/pipelines/generation/utils/sql.py (1)
wren-ai-service/src/providers/engine/wren.py (4)
  • dry_plan (209-242)
  • execute_sql (26-139)
  • execute_sql (159-207)
  • execute_sql (278-328)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pytest
  • GitHub Check: pytest
  • GitHub Check: Analyze (go)
🔇 Additional comments (5)
wren-ai-service/eval/utils.py (1)

40-40: LGTM: SQL now passed directly to engine endpoints.

The removal of SQL quoting is consistent with the PR objectives. Raw SQL is now sent to the engine for processing.

Also applies to: 61-61, 162-162

wren-ai-service/eval/data_curation/utils.py (1)

52-52: LGTM: SQL validation now uses raw SQL.

The changes correctly pass raw SQL through remove_limit_statement to both the duckdb and non-duckdb validation endpoints.

Also applies to: 68-68

wren-ai-service/tools/run_sql.py (1)

49-49: LGTM: SQL execution now uses unmodified SQL.

The changes correctly pass raw SQL to both duckdb and non-duckdb execution endpoints.

Also applies to: 69-69

wren-ai-service/src/pipelines/generation/utils/sql.py (2)

84-161: LGTM: Validation logic correctly refactored to use raw SQL.

The refactored _classify_generation_result method correctly passes generation_result directly to all three validation paths (dry_plan, dry_run, and data preview) without pre-processing. The error handling properly categorizes failures with appropriate error types (TIME_OUT, DRY_PLAN, DRY_RUN, PREVIEW_EMPTY_DATA, PREVIEW_FAILED).


201-206: Runtime database validation effectively enforces SQL syntax compliance; no additional parser validation needed.

The current implementation validates generated SQL through database execution (dry_plan or execute_sql with dry_run), which serves as a robust enforcement mechanism. If the LLM generates improperly quoted identifiers or other syntax violations, the database will reject it and mark the result invalid for correction.

No evidence of sqlglot usage was found in the codebase. The prompt instructions (lines 201-206) direct the LLM on quoting rules, and failed SQL generation is caught by post-execution validation and routed to the SQL correction pipeline. This approach is functionally sound—database rejection is a stronger enforcement than a pre-execution parser would provide.

2. YOU MUST ONLY CHOOSE the appropriate functions from the sql functions list and use them in the SQL query if the section of SQL FUNCTIONS is available in user's input.
3. YOU MUST REFER to the sql samples and learn the usage of the schema structures and how SQL is written based on them if the section of SQL SAMPLES is available in user's input.
4. YOU MUST FOLLOW the reasoning plan step by step strictly to generate the SQL query if the section of REASONING PLAN is available in user's input.
5. YOU MUST FOLLOW SQL Rules if they are not contradicted with instructions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Softened rule enforcement may reduce quoting compliance.

Changing from "YOU MUST FOLLOW SQL Rules" to "YOU MUST FOLLOW SQL Rules if they are not contradicted with instructions" weakens the enforcement of the quoting rules added in lines 201-206. This could lead to scenarios where the LLM skips quoting if it interprets user instructions as contradicting the formatting rules.

Consider reverting to the stronger enforcement language, or at minimum, explicitly state that the quoting rules in lines 201-206 are mandatory regardless of other instructions:

-5. YOU MUST FOLLOW SQL Rules if they are not contradicted with instructions.
+5. YOU MUST FOLLOW SQL Rules. The quoting rules (double quotes for identifiers, single quotes for strings) are mandatory and cannot be overridden by user instructions.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
5. YOU MUST FOLLOW SQL Rules if they are not contradicted with instructions.
5. YOU MUST FOLLOW SQL Rules. The quoting rules (double quotes for identifiers, single quotes for strings) are mandatory and cannot be overridden by user instructions.
🤖 Prompt for AI Agents
In wren-ai-service/src/pipelines/generation/utils/sql.py around line 262 (and
referring to the SQL rules block at lines 201-206), the current wording weakens
enforcement by adding "if they are not contradicted with instructions"; restore
strict enforcement by either reverting line 262 back to "YOU MUST FOLLOW SQL
Rules" or append an explicit sentence making the quoting rules in lines 201-206
mandatory and to be followed even if other instructions appear to conflict,
ensuring the rule text unambiguously mandates quoting compliance and overrides
contradictory user instructions.

@cyyeh cyyeh merged commit 0642f41 into main Oct 29, 2025
10 of 11 checks passed
@cyyeh cyyeh deleted the feat/ai-service/disable-sqlglot branch October 29, 2025 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/ai-service ai-service related module/ai-service ai-service related wren-ai-service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants