feat(wren-ai-service): remove sqlglot and add quotes instructions in prompt#2010
feat(wren-ai-service): remove sqlglot and add quotes instructions in prompt#2010
Conversation
WalkthroughRemoves SQL quoting via sqlglot from the entire wren-ai-service codebase. The Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Pipeline as SQL Gen Pipeline
participant Rules as SQL Rules
participant Engine as Engine/Endpoints
User->>Pipeline: Input query request
Pipeline->>Rules: Apply SQL generation rules
Note over Rules: Rules now include explicit<br/>quoting guidelines for identifiers<br/>and literals
Rules->>Pipeline: Generated SQL<br/>(with proper quoting per rules)
alt use_dry_plan
Pipeline->>Engine: dry_plan(sql)
Engine-->>Pipeline: Plan result
Pipeline->>Pipeline: Classify as valid/invalid
else use_dry_run
Pipeline->>Engine: dry_run(sql)
Engine-->>Pipeline: Execution result
Pipeline->>Pipeline: Classify success/error
else preview mode
Pipeline->>Engine: preview(sql)
Engine-->>Pipeline: Preview data
Pipeline->>Pipeline: Classify with has_data/error
end
Pipeline-->>User: Classified result with<br/>generation_result & error type
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
wren-ai-service/poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
wren-ai-service/eval/data_curation/utils.py(2 hunks)wren-ai-service/eval/utils.py(3 hunks)wren-ai-service/pyproject.toml(0 hunks)wren-ai-service/src/core/engine.py(0 hunks)wren-ai-service/src/pipelines/generation/utils/sql.py(4 hunks)wren-ai-service/tools/run_sql.py(2 hunks)
💤 Files with no reviewable changes (2)
- wren-ai-service/pyproject.toml
- wren-ai-service/src/core/engine.py
🧰 Additional context used
🧬 Code graph analysis (2)
wren-ai-service/eval/data_curation/utils.py (1)
wren-ai-service/src/core/engine.py (1)
remove_limit_statement(44-48)
wren-ai-service/src/pipelines/generation/utils/sql.py (1)
wren-ai-service/src/providers/engine/wren.py (4)
dry_plan(209-242)execute_sql(26-139)execute_sql(159-207)execute_sql(278-328)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pytest
- GitHub Check: pytest
- GitHub Check: Analyze (go)
🔇 Additional comments (5)
wren-ai-service/eval/utils.py (1)
40-40: LGTM: SQL now passed directly to engine endpoints.The removal of SQL quoting is consistent with the PR objectives. Raw SQL is now sent to the engine for processing.
Also applies to: 61-61, 162-162
wren-ai-service/eval/data_curation/utils.py (1)
52-52: LGTM: SQL validation now uses raw SQL.The changes correctly pass raw SQL through
remove_limit_statementto both the duckdb and non-duckdb validation endpoints.Also applies to: 68-68
wren-ai-service/tools/run_sql.py (1)
49-49: LGTM: SQL execution now uses unmodified SQL.The changes correctly pass raw SQL to both duckdb and non-duckdb execution endpoints.
Also applies to: 69-69
wren-ai-service/src/pipelines/generation/utils/sql.py (2)
84-161: LGTM: Validation logic correctly refactored to use raw SQL.The refactored
_classify_generation_resultmethod correctly passesgeneration_resultdirectly to all three validation paths (dry_plan, dry_run, and data preview) without pre-processing. The error handling properly categorizes failures with appropriate error types (TIME_OUT, DRY_PLAN, DRY_RUN, PREVIEW_EMPTY_DATA, PREVIEW_FAILED).
201-206: Runtime database validation effectively enforces SQL syntax compliance; no additional parser validation needed.The current implementation validates generated SQL through database execution (dry_plan or execute_sql with dry_run), which serves as a robust enforcement mechanism. If the LLM generates improperly quoted identifiers or other syntax violations, the database will reject it and mark the result invalid for correction.
No evidence of sqlglot usage was found in the codebase. The prompt instructions (lines 201-206) direct the LLM on quoting rules, and failed SQL generation is caught by post-execution validation and routed to the SQL correction pipeline. This approach is functionally sound—database rejection is a stronger enforcement than a pre-execution parser would provide.
| 2. YOU MUST ONLY CHOOSE the appropriate functions from the sql functions list and use them in the SQL query if the section of SQL FUNCTIONS is available in user's input. | ||
| 3. YOU MUST REFER to the sql samples and learn the usage of the schema structures and how SQL is written based on them if the section of SQL SAMPLES is available in user's input. | ||
| 4. YOU MUST FOLLOW the reasoning plan step by step strictly to generate the SQL query if the section of REASONING PLAN is available in user's input. | ||
| 5. YOU MUST FOLLOW SQL Rules if they are not contradicted with instructions. |
There was a problem hiding this comment.
Softened rule enforcement may reduce quoting compliance.
Changing from "YOU MUST FOLLOW SQL Rules" to "YOU MUST FOLLOW SQL Rules if they are not contradicted with instructions" weakens the enforcement of the quoting rules added in lines 201-206. This could lead to scenarios where the LLM skips quoting if it interprets user instructions as contradicting the formatting rules.
Consider reverting to the stronger enforcement language, or at minimum, explicitly state that the quoting rules in lines 201-206 are mandatory regardless of other instructions:
-5. YOU MUST FOLLOW SQL Rules if they are not contradicted with instructions.
+5. YOU MUST FOLLOW SQL Rules. The quoting rules (double quotes for identifiers, single quotes for strings) are mandatory and cannot be overridden by user instructions.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| 5. YOU MUST FOLLOW SQL Rules if they are not contradicted with instructions. | |
| 5. YOU MUST FOLLOW SQL Rules. The quoting rules (double quotes for identifiers, single quotes for strings) are mandatory and cannot be overridden by user instructions. |
🤖 Prompt for AI Agents
In wren-ai-service/src/pipelines/generation/utils/sql.py around line 262 (and
referring to the SQL rules block at lines 201-206), the current wording weakens
enforcement by adding "if they are not contradicted with instructions"; restore
strict enforcement by either reverting line 262 back to "YOU MUST FOLLOW SQL
Rules" or append an explicit sentence making the quoting rules in lines 201-206
mandatory and to be followed even if other instructions appear to conflict,
ensuring the rule text unambiguously mandates quoting compliance and overrides
contradictory user instructions.
we'll remove add quotes functionality in sqlglot as it sometimes changes sql query
Summary by CodeRabbit