Skip to content

feat(ai): compact structured output for /explain and /optimize#747

Merged
NikolayS merged 3 commits intomainfrom
fix/742-ai-compact-output
Mar 26, 2026
Merged

feat(ai): compact structured output for /explain and /optimize#747
NikolayS merged 3 commits intomainfrom
fix/742-ai-compact-output

Conversation

@NikolayS
Copy link
Copy Markdown
Owner

Closes #742

Replaces verbose prose AI output with compact structured format for terminal use. All findings preserved.

/explain output format:

Bottleneck: Seq Scan on orders (cost=0..4821, 98K rows)
Filters: orders.status -- consider index
Time: planning 2ms, execution 847ms

/optimize output format:

1. CREATE INDEX CONCURRENTLY orders_status_idx ON orders(status); -- seq scan → index scan, ~10x speedup
2. Rewrite subquery as JOIN -- eliminates nested loop

@NikolayS
Copy link
Copy Markdown
Owner Author

REV Review — fix/742-ai-compact-output

APPROVE (one minor fix recommended)

What changed

  • /explain system prompt: replaced bullet-list instructions with structured labeled-line format (Bottleneck:, Est error:, Slow nodes:, Filters:, Wait:, Time:). Good — all findings preserved, just structured.
  • /optimize system prompt: replaced prose instructions with numbered action list format. Correct.
  • max_tokens hardcoded: 400 for explain, 500 for optimize.

One issue: max_tokens overrides user config

// Current (overrides user config silently):
max_tokens: 400,

// Better (caps at 400 but respects lower user configs):
max_tokens: settings.config.ai.max_tokens.min(400),

Same for optimize: settings.config.ai.max_tokens.min(500).

A user who set max_tokens = 200 in their config now silently gets 400 for explain. A user who set max_tokens = 2000 gets capped to 400 without knowing. Using .min() preserves user intent.

This is a minor issue, not a blocker. Fix it before merge if possible.

No other issues

  • Prompt logic: correct — omit-if-empty labels, all findings preserved, no information loss.
  • Tests: valid use of include_str! for compile-time prompt verification.
  • No SOC2 items.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 25, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.11%. Comparing base (3fe34a2) to head (31e6b3b).

Files with missing lines Patch % Lines
src/repl/ai_commands.rs 83.33% 2 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #747      +/-   ##
==========================================
+ Coverage   69.10%   69.11%   +0.01%     
==========================================
  Files          47       47              
  Lines       31990    32000      +10     
==========================================
+ Hits        22105    22115      +10     
  Misses       9885     9885              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@NikolayS
Copy link
Copy Markdown
Owner Author

Code Review: PR #747 — feat(ai): compact structured output for /explain and /optimize

Summary

Good direction — structured output reduces token waste and makes responses more parseable. Two real concerns worth fixing before merge.


BLOCKING

B1: max_tokens hardcoded, ignoring user config — confidence 8

/explain: max_tokens: 400
/optimize: max_tokens: 500

Both commands previously used settings.config.ai.max_tokens. These hardcoded values:

  1. Silently break user config. If a user sets ai.max_tokens = 200 (low-cost model) or ai.max_tokens = 2000 (wants verbose output), it's now ignored without warning. There is no documentation of this behavior change in the PR.

  2. May truncate valid output. The /explain prompt allows up to 8 labeled lines plus the 6 defined labels. With a multi-field Time: line and verbose Bottleneck: node names (real plans can have long node descriptions), 400 tokens may not be enough. Truncated structured output is harder to parse than truncated prose.

  3. The fix is simple. Use settings.config.ai.max_tokens.unwrap_or(400) (or similar), capping at a sensible default while respecting user overrides. Or document in config that these commands cap at 400/500 and let user override with a per-command setting.

This is a behavior regression for any user with non-default max_tokens config.


NON-BLOCKING

N1: Tests are source-text assertions, not behavioral tests — confidence 7

fn explain_system_prompt_is_structured() {
    let prompt_fragment = "Bottleneck:";
    assert!(
        include_str!("ai_commands.rs").contains(prompt_fragment),
        ...
    );
}

These tests use include_str!("ai_commands.rs") to assert that the source file itself contains a string. This is not testing runtime behavior — it's testing that a string exists somewhere in the file, which could be in a comment, a dead code path, or a previous version of the prompt.

Specific problems:

  • A refactor that moves the prompt to a constant or separate file would silently break the test (or pass on a file that no longer contains the string in the active code path).
  • include_str! path resolution is relative to the source file at compile time — this only works while the test lives in ai_commands.rs. It's fragile.
  • The test provides no signal about whether the format is actually used when the function is called.

Better approach: extract the system prompt construction into a pure function (e.g., build_explain_system_prompt(params, schema, wait) -> String) and test that function directly. The test then asserts on the actual runtime value passed to the AI.

N2: /optimize prompt drops explicit context about what to analyze — confidence 5

Old prompt: "Analyse the query, its EXPLAIN ANALYZE plan, and table statistics"

New prompt starts with: "Output a numbered list of concrete optimization actions, highest impact first."

The new prompt omits explicit instruction to consider the query, plan, and statistics. The LLM will infer this from context (schema is still injected), but removing the explicit framing risks responses that optimize for index creation only, ignoring query rewrites or statistics-based issues. Worth adding a one-liner: "Analyse the query, plan, and schema below, then output..."

N3: Est error threshold is baked into the prompt with no configurability — confidence 4

Est error: <if actual rows diverges from estimated by >5x, state both>

5x is a reasonable default, but this is the kind of threshold that DBAs want to tune (some care about 2x on large tables; others ignore 10x on small ones). This isn't a blocker, but worth noting as a future config point if rpg grows per-user tuning.


POTENTIAL

P1: "Maximum 8 lines total" conflicts with defined label count — confidence 4

The prompt defines 6 labeled output lines (Bottleneck, Est error, Slow nodes, Filters, Wait, Time). "Maximum 8 lines total" permits 2 unlabeled lines. The rules also say "no prose, no markdown, no headers" — so what are the 2 extra lines for? This ambiguity could cause the LLM to add extra content that downstream parsing doesn't expect. If the intent is strictly 6 labels, say "Maximum 6 lines total." If there's a reason for 8, document it.


What's Good

  • Structured labeled output is the right call for /explain — dramatically easier to parse programmatically later.
  • temperature: 0.0 is correct for deterministic structured output.
  • The CREATE INDEX CONCURRENTLY requirement in /optimize is a real safety improvement — prevents accidental CREATE INDEX without CONCURRENTLY on production tables.
  • Ordering by expected impact (already existed, preserved here) is the right UX.
  • Omit-if-nothing-to-report is good — no noise for clean plans.

Bottom Line

Fix B1 before merge — hardcoding max_tokens while silently ignoring user config is a regression. N1 (test quality) is worth addressing in the same PR since it's cheap to fix by extracting the prompt builder. Rest can follow.

@NikolayS
Copy link
Copy Markdown
Owner Author

Testing Evidence — fix/742-ai-compact-output

Commit: f597636 feat(ai): compact structured output for /explain and /optimize (#742)

Prompt verification

/explain system prompt (structured labels):

         Bottleneck: <most expensive node, cost, actual rows>\n\
         Est error: <if actual rows diverges from estimated by >5x, state both>\n\
         Slow nodes: <nodes with actual time >100ms>\n\
         Filters: <sequential scans or filters that could benefit from an index>\n\
         Wait: <dominant wait event if wait data provided>\n\
         Time: planning Xms, execution Xms\n\n\
         Rules:\n\
         - Output ONLY the labeled lines above, no prose, no markdown, no headers\n\
         - Omit any line where there is nothing significant to report\n\
--

/optimize system prompt (numbered list with CREATE INDEX CONCURRENTLY):

         - CREATE INDEX items must be valid SQL: CREATE INDEX CONCURRENTLY name ON table(col);\n\
         - Maximum 6 items\n\
         - No prose intro, no section headers, no markdown\n\
         - Include ALL significant findings — just as numbered action items\n\
--

max_tokens values:

        max_tokens: 150,
        max_tokens: 300,
        max_tokens: 400,
        max_tokens: 400,
        max_tokens: 500,

Test results

Unit test suite: 1811 passed; 0 failed (finished in 1.64s)

Integration smoke tests: 28 skipped — all require live Postgres on port 15432 (expected, not available in this environment). The tests that could run passed (integration_repl: 18 passed; 0 failed).

New prompt tests

running 2 tests
test repl::ai_commands::tests::optimize_system_prompt_is_numbered_list ... ok
test repl::ai_commands::tests::explain_system_prompt_is_structured ... ok

test result: ok. 2 passed; 0 failed; 0 ignored

Both new prompt validation tests pass. The /explain prompt uses structured labeled output (Bottleneck, Est error, Slow nodes, Filters, Wait, Time) with max 8 lines. The /optimize prompt produces a numbered action list with valid CREATE INDEX CONCURRENTLY SQL, max 6 items. max_tokens reduced to compact values (150/300/400/400/500).

@NikolayS
Copy link
Copy Markdown
Owner Author

Fixed all three REV findings in one commit (2097a66):

  • max_tokens (explain): settings.config.ai.max_tokens.min(400) — respects user config, caps at compact budget
  • max_tokens (optimize): settings.config.ai.max_tokens.min(500) — same pattern
  • N2 (optimize prompt): Added "Analyse the query, its EXPLAIN ANALYZE plan, and table statistics, then output..." framing before the numbered-list instruction
  • P1 (explain prompt): "Maximum 8 lines total" → "Maximum 6 lines total"

@NikolayS
Copy link
Copy Markdown
Owner Author

REV Code Review Report

CI Status: ✅ All checks pass (CodeQL, lint, tests, coverage, builds)


BLOCKING ISSUES (0)

No blocking issues found.


NON-BLOCKING (3)

INFO src/repl/ai_commands.rs:2731-2744 - Tests verify source text, not behavior

Both new tests use include_str!("ai_commands.rs") to assert the current source file contains certain string literals. This tests that the string exists in the file, not that the AI prompt is constructed and used correctly at runtime. If the prompt content is ever extracted to a constant or template, these tests could pass vacuously while the actual prompt is broken.

Suggestion: Test the constructed system_content string in a unit-testable helper function, or use snapshot testing on the prompt strings. The current approach is fragile — it's testing the source file as a string, not the program's behavior.

LOW src/repl/ai_commands.rs:1668 - Silent max_tokens cap undocumented

max_tokens: settings.config.ai.max_tokens.min(400) silently caps the user's configured max_tokens for /explain (400) and /optimize (500) without any documentation or warning. A user who sets max_tokens = 1000 in their config will be silently capped and may be confused when responses appear truncated.

Suggestion: Add a doc comment or log a debug message when the cap is applied, e.g., // Capped: compact output format uses at most 400 tokens. Even better, document this cap in the config TOML schema.

INFO Commit subjects exceed 50-char guideline

Both commits exceed the project's 50-char subject line limit (from development__git-commit-standards):

  • fix(ai): respect user max_tokens, fix optimize prompt framing, fix line count (#742) — 84 chars
  • feat(ai): compact structured output for /explain and /optimize (#742) — 69 chars

Suggestion: This is a squash-merge PR so the final commit message can be shortened at merge time: feat(ai): compact structured output for /explain, /optimize (#742) (67 chars — still a bit long but acceptable) or split to two lines.


POTENTIAL ISSUES (1)

MEDIUM src/repl/ai_commands.rs:1631-1648 - Time: output label format may conflict with user's terminal rendering (confidence: 5/10)

The explain prompt instructs the model to output Time: planning Xms, execution Xms as one of the labeled lines. This is close but not identical to the existing rpg diagnostics timestamp format. If rpg parses or highlights output lines by prefix, the Time: label might be misinterpreted. Manual review needed — may be a false positive depending on how rpg renders AI output.

Suggestion: Verify that the Time: label doesn't conflict with any output parsing or ANSI highlight rules in the renderer.


Summary

Area Findings Potential Filtered
CI/Pipeline 0 0 0
Security 0 0 0
Bugs 0 0 1
Tests 1 0 0
Guidelines 1 0 0
Docs 1 1 0
Metadata 0 0 0

Note:

  • Findings: High-confidence issues (8-10/10) — blocking or non-blocking per severity
  • Potential: Medium-confidence issues (4-7/10) — review manually, may be false positives
  • Filtered: Low-confidence issues (0-3/10) — excluded as likely false positives

REV-assisted review (AI analysis by postgres-ai/rev)

1 similar comment
@NikolayS
Copy link
Copy Markdown
Owner Author

REV Code Review Report

CI Status: ✅ All checks pass (CodeQL, lint, tests, coverage, builds)


BLOCKING ISSUES (0)

No blocking issues found.


NON-BLOCKING (3)

INFO src/repl/ai_commands.rs:2731-2744 - Tests verify source text, not behavior

Both new tests use include_str!("ai_commands.rs") to assert the current source file contains certain string literals. This tests that the string exists in the file, not that the AI prompt is constructed and used correctly at runtime. If the prompt content is ever extracted to a constant or template, these tests could pass vacuously while the actual prompt is broken.

Suggestion: Test the constructed system_content string in a unit-testable helper function, or use snapshot testing on the prompt strings. The current approach is fragile — it's testing the source file as a string, not the program's behavior.

LOW src/repl/ai_commands.rs:1668 - Silent max_tokens cap undocumented

max_tokens: settings.config.ai.max_tokens.min(400) silently caps the user's configured max_tokens for /explain (400) and /optimize (500) without any documentation or warning. A user who sets max_tokens = 1000 in their config will be silently capped and may be confused when responses appear truncated.

Suggestion: Add a doc comment or log a debug message when the cap is applied, e.g., // Capped: compact output format uses at most 400 tokens. Even better, document this cap in the config TOML schema.

INFO Commit subjects exceed 50-char guideline

Both commits exceed the project's 50-char subject line limit (from development__git-commit-standards):

  • fix(ai): respect user max_tokens, fix optimize prompt framing, fix line count (#742) — 84 chars
  • feat(ai): compact structured output for /explain and /optimize (#742) — 69 chars

Suggestion: This is a squash-merge PR so the final commit message can be shortened at merge time: feat(ai): compact structured output for /explain, /optimize (#742) (67 chars — still a bit long but acceptable) or split to two lines.


POTENTIAL ISSUES (1)

MEDIUM src/repl/ai_commands.rs:1631-1648 - Time: output label format may conflict with user's terminal rendering (confidence: 5/10)

The explain prompt instructs the model to output Time: planning Xms, execution Xms as one of the labeled lines. This is close but not identical to the existing rpg diagnostics timestamp format. If rpg parses or highlights output lines by prefix, the Time: label might be misinterpreted. Manual review needed — may be a false positive depending on how rpg renders AI output.

Suggestion: Verify that the Time: label doesn't conflict with any output parsing or ANSI highlight rules in the renderer.


Summary

Area Findings Potential Filtered
CI/Pipeline 0 0 0
Security 0 0 0
Bugs 0 0 1
Tests 1 0 0
Guidelines 1 0 0
Docs 1 1 0
Metadata 0 0 0

Note:

  • Findings: High-confidence issues (8-10/10) — blocking or non-blocking per severity
  • Potential: Medium-confidence issues (4-7/10) — review manually, may be false positives
  • Filtered: Low-confidence issues (0-3/10) — excluded as likely false positives

REV-assisted review (AI analysis by postgres-ai/rev)

@NikolayS
Copy link
Copy Markdown
Owner Author

REV Code Review — PR #747

PR: #747 — feat(ai): compact structured output for /explain and /optimize
Branch: fix/742-ai-compact-output
AI-Assisted: Yes


BLOCKING ISSUES

None.


NON-BLOCKING

LOW src/repl/ai_commands.rs:1665max_tokens.min(400) hard-codes a cap that could silently truncate output

If a user explicitly sets max_tokens to a lower value (e.g., 200), min(400) is a no-op — fine. But if they set it higher to get verbose output, this silently overrides their preference for /explain. The cap is intentional (enforce compact output) but should be documented in the config or help text. Same applies to the 500-token cap in /optimize.

LOW src/repl/ai_commands.rs:2725–2747 — Self-referential tests using include_str!("ai_commands.rs")

These tests verify that specific string literals exist in the source file itself. They will always pass as long as the source compiles, but they do not test runtime behavior. They would also fail if the file is renamed. Consider removing them or converting to integration tests that verify actual prompt construction. As written they add noise without value.


POTENTIAL ISSUES

MEDIUM src/repl/ai_commands.rs:1631–1647 — Structured output format relies on LLM compliance; no parser enforces it (confidence: 6/10)

The system prompt instructs the model to output labeled lines (Bottleneck:, Est error:, etc.), but the response is printed verbatim with println!. If the model deviates (prose, markdown, JSON), the user sees unformatted output. This is acceptable for now but is a known fragility in prompt-based structured output. Note it as a future improvement (e.g., post-process to strip markdown fences).

MEDIUM src/repl/ai_commands.rs:1843–1858 — Optimize prompt: numbered list format constraint may conflict with multi-line CREATE INDEX

The format spec N. <action> -- <one-line rationale> requires single-line items, but a CREATE INDEX CONCURRENTLY statement can be long. LLMs may wrap it. Not a code bug, but the prompt could benefit from -- keep each item on one line even if long to reinforce the constraint.


Summary

Area Findings Potential Filtered
Security 0 0 0
Bugs 0 0 0
Tests 0 1 1
Guidelines 0 0 0
Docs 1 0 0

Compact output is a meaningful UX improvement — the structured labeled-line format for /explain and numbered action list for /optimize are well-designed. The max_tokens cap is correct for keeping output tight. CI is green.


REV-assisted review (AI analysis by postgres-ai/rev)

@NikolayS
Copy link
Copy Markdown
Owner Author

REV

PR #747 — feat(ai): compact structured output for /explain and /optimize

Summary: Rewrites the system prompts for /explain and /optimize to produce compact, structured output instead of prose. /explain outputs labeled lines (Bottleneck, Est error, Slow nodes, Filters, Wait, Time); /optimize outputs a numbered action list with rationale. Both cap max_tokens at 400/500 respectively.

Observations:

  1. max_tokens cap via .min() — Good approach. Prevents runaway token usage. But: if settings.config.ai.max_tokens is 0 (unset/default), .min(400) gives 0, which may mean "no limit" or "error" depending on the provider. Worth checking the default value and whether 0 needs special handling.

  2. Prompt self-test pattern — The two tests (explain_system_prompt_is_structured, optimize_system_prompt_is_numbered_list) use include_str!("ai_commands.rs") to assert prompt fragments exist. Clever but brittle: the test passes as long as the string is anywhere in the file, not necessarily in the prompt constant. They'd survive an accidental revert of the prompt change as long as a comment mentions the string. Low-risk given the file, but noting it.

  3. "Omit any line where there is nothing to report" — Good directive. Avoids empty labeled lines cluttering output.

  4. CREATE INDEX CONCURRENTLY in optimize prompt — solid. Forces the AI to output usable SQL rather than vague index suggestions.

  5. No trailing wait context for /optimize/explain passes {wait} for wait event data; /optimize does not. Intentional? If wait data is available, it could also inform optimize suggestions (e.g. LWLock:LockManager → partitioning).

  6. Max 6 items — reasonable cap. Could collide with "include ALL significant findings" directive if there are >6 real issues. The AI will prioritize by impact, which is the right behavior.

Verdict: Solid prompt engineering. Points 1 and 5 worth a quick check. No blockers.

@NikolayS
Copy link
Copy Markdown
Owner Author

REV

#747 — feat(ai): compact structured output for /explain and /optimize

Overall: solid improvement. Structured labeled-line output is a meaningful UX win over prose — scannable at a glance, fits narrow terminals. The max_tokens cap is sensible.

What's good

  • Prompt redesign is well-reasoned: labeled lines are unambiguous, easy to parse programmatically if needed later
  • max_tokens.min(400/500) cap prevents runaway cost on verbose models
  • The two self-referential tests (include_str!) are creative but fragile — they'll always pass as long as the string exists in the file, which it will after this diff. Not harmful though.

Concerns

  1. Bottleneck: line format — the prompt says <most expensive node, cost, actual rows> but doesn't enforce units. A model could output Bottleneck: Hash Join (cost=50000) or Bottleneck: Hash Join (50 000ms, 1M rows) interchangeably. Consider tightening: specify whether you want node name + actual time or node name + cost.

  2. optimize prompt missing schema context — looking at the diff, the schema variable is included in the format string but there's no wait equivalent for the optimize path. Is that intentional? The explain path passes wait event data; optimize doesn't seem to. That asymmetry seems fine but worth noting in a comment.

  3. Empty-line omission rule — "omit any line where there is nothing to report" combined with "maximum 6 lines total" is good. But the prompt doesn't define what "significant" means for the Est error line. >5x is stated for Est error — good. Nothing similar for Slow nodes threshold. The 100ms cutoff lives only in the label example, not as an explicit rule. Worth making explicit.

  4. No integration test — the self-referential tests verify the string exists, not that the model output actually parses correctly. A unit test that takes a sample structured response and verifies the display path would be more robust. Not blocking.

Minor

  • Comment in optimize system prompt: Schema:\n{schema}\n has a trailing newline inside the format string that will produce a blank line before the API call ends. Cosmetic, but slightly wasteful.

Verdict: approve with nits. Not blocking.

@NikolayS
Copy link
Copy Markdown
Owner Author

REV Code Review — PR #747: feat(ai): compact structured output for /explain and /optimize

CI: green | Diff scope: src/repl/ai_commands.rs


Blocking findings

None.


Non-blocking findings

[BUG / MEDIUM] max_tokens.min(400) may silently truncate structured output mid-label

The CompletionOptions change caps /explain at 400 tokens and /optimize at 500. If the AI is in the middle of a structured line (e.g., a long Filters: line with multiple index suggestions) and hits the token cap, the output will be truncated mid-sentence with no indication to the user. The structured format is intentionally concise ("max 6 lines total"), but with a noisy schema context or many wait events, 400 tokens can be tight. Consider at minimum logging a warning if finish_reason == "length", or bumping to 512/640 to leave headroom.

[BEHAVIOR / LOW] max_tokens from user config is silently clamped without notice

If a user has configured ai.max_tokens = 1000 (reasonable for other commands), the .min(400) cap silently overrides it for /explain. The user gets less output than their config specifies without explanation. At minimum, a comment in the code noting this is intentional would help; ideally, the cap should be a named constant (MAX_EXPLAIN_TOKENS) to make it visible and adjustable.

[BEHAVIOR / LOW] /optimize prompt instructs CREATE INDEX CONCURRENTLY but does not note that this requires superuser or pg_monitor on some RDS/Aurora versions

The system prompt says "CREATE INDEX items must be valid SQL: CREATE INDEX CONCURRENTLY name ON table(col);" — this is good practice, but CONCURRENTLY is not available inside a transaction and is not permitted on temp tables. The AI may generate CREATE INDEX CONCURRENTLY for cases where it is not appropriate (temp tables, existing transaction context). A small note in the prompt ("omit CONCURRENTLY for temp tables") would improve output quality.

[STYLE / LOW] Two self-referential tests that use include_str!

fn explain_system_prompt_is_structured() {
    let prompt_fragment = "Bottleneck:";
    assert!(include_str!("ai_commands.rs").contains(prompt_fragment), ...);
}

These tests verify that the source file contains a literal string. This is a "change detector" test — it passes as long as the code compiles and the string exists, but it does not test behavior. If the prompt changes to use a different label, the test fails with a confusing message. Better: extract the system prompt content into a const or a function, then test that the returned string contains the required fragments. This would also make the prompt easier to update.


Potential issues (low confidence)

[BEHAVIOR] Structured output format depends entirely on AI model compliance

The new prompts use imperative instructions ("Output ONLY the labeled lines", "No prose intro, no section headers, no markdown") but AI models can deviate, especially on edge-case inputs. There is no parsing or validation of the AI response — the output is printed verbatim. If the model ignores the format instructions, the user gets prose instead of structured output. This is inherent to prompt-based formatting; acceptable for now, but worth a comment that structured output is best-effort.


Summary

The compact output format is a meaningful UX improvement — structured labeled lines are much more scannable than prose paragraphs. The max_tokens cap is the most actionable finding: silent truncation mid-output is worse than a slightly higher cap. The self-referential tests are harmless but could be improved. No blockers.

@NikolayS
Copy link
Copy Markdown
Owner Author

CI is green and REV review found no blocking findings. This PR is ready for merge — awaiting Nik's approval.

@NikolayS
Copy link
Copy Markdown
Owner Author

REV

PR #747 — feat(ai): compact structured output for /explain and /optimize

Good direction — structured labeled-line output for /explain and numbered action items for /optimize are strictly better than prose walls. The max_tokens caps (400/500) are a smart addition.

Observations:

  1. Self-referential tests are fragileexplain_system_prompt_is_structured does include_str!("ai_commands.rs") and checks for "Bottleneck:". This test will always pass as long as the word exists in the file, even in a comment. Better to extract the system prompt string into a const or a builder function that the test can call directly and assert on.

  2. /optimize prompt places Database: section after the rules block — minor ordering issue, but schema context is usually easier to read when it comes before the formatting rules. Also the rules block gained \n but the schema section ends with just \n (no trailing blank line before the inference). Not a bug, just cosmetic.

  3. max_tokens.min(400) hard-codes the cap — if someone sets max_tokens = 200 in config, this silently overrides their value upward to 400 for /explain. The intent is a ceiling not a floor, so the logic is correct, but the comment should say "cap output tokens" not imply it's additive.

  4. No update to COMMANDS.md or user-facing docs for the new output format** — users who read the help text won't know the output structure changed. A one-liner in COMMANDS.md for /explain and /optimize would be enough.

Nothing blocking. The self-referential test issue (#1) is the main thing worth addressing before merge.

@NikolayS
Copy link
Copy Markdown
Owner Author

Testing Evidence — PR #747

Tested on branch fix/742-ai-compact-output (commit 2097a66), built locally on Linux x86_64, against PostgreSQL 17.7 (Debian), db ashtest with 100K-row workload table (id bigint PK, val text).

CI Status

All 16 checks pass:

Checks / Test              pass  33s
Checks / Lint              pass  20s
Checks / Integration Tests pass  53s
Checks / Coverage          pass  1m54s
Checks / Compatibility     pass  1m58s
Checks / Connection Tests  pass  3m46s
Build (linux-x86_64)       pass  2m23s
Build (darwin-aarch64)     pass  1m47s
Analyze (rust)             pass  5m59s
... (16/16 passing)

/explain — sequential scan with missing index

Command:

ashtest=# /explain SELECT * FROM workload WHERE val = 'foo'

EXPLAIN ANALYZE output (shown verbatim by rpg):

Seq Scan on public.workload  (cost=0.00..2501.00 rows=1 width=41) (actual time=97.513..97.513 rows=0 loops=1)
  Output: id, val
  Filter: (workload.val = 'foo'::text)
  Rows Removed by Filter: 100000
  Buffers: shared hit=1251
Planning Time: 0.187 ms
Execution Time: 97.535 ms

AI compact output (4 lines, max 6):

Bottleneck: Seq Scan on public.workload, cost=0.00..2501.00, actual rows=0
Slow nodes: Seq Scan on public.workload, actual time=97.513ms
Filters: Filter: (workload.val = 'foo'::text)
Time: planning 0.187ms, execution 97.535ms

4 labeled lines. No prose. No markdown headers. Empty labels omitted.


/explain — index scan (fast path)

Command:

ashtest=# /explain SELECT * FROM workload WHERE id = 1

AI compact output (1 line — nothing significant to report for other labels):

Time: planning 0.108ms, execution 0.058ms

Demonstrates that irrelevant labels are omitted entirely (Bottleneck/Filters/Slow nodes suppressed for a trivial 0.058ms index scan).


/optimize — same sequential scan

Command:

ashtest=# /optimize SELECT * FROM workload WHERE val = 'foo'

AI compact output (numbered list, 6 items max):

1. CREATE INDEX CONCURRENTLY idxworkloadval ON public.workload(val) -- This will allow the query to use an index scan instead of a sequential scan, significantly reducing execution time.

2. VACUUM public.workload -- This will reclaim space from dead tuples, improving overall performance and potentially reducing the cost of future scans.

3. ANALYZE public.workload -- This will update the statistics for the query planner, helping it make better decisions about query execution plans.

4. Increase work_mem for the session -- This may help with sorting and joining operations if the query is expanded in the future, improving performance.

5. Partition the workload table if applicable -- If the table grows significantly, partitioning can help manage data more efficiently and improve query performance.

6. Consider using a materialized view for frequently accessed queries -- This can speed up access times for complex queries by pre-computing and storing results.

6 numbered items. Item 1 is valid SQL (CREATE INDEX CONCURRENTLY). No prose intro. Ordered by impact.


Summary

Command Lines output Format
/explain (seq scan) 4 Labeled lines, max 6
/explain (index scan) 1 Labeled lines, empty suppressed
/optimize 6 Numbered action list, max 6

Compact format confirmed working. All findings preserved. All CI checks green.

@NikolayS
Copy link
Copy Markdown
Owner Author

Fix verified: commit 31e6b3b

Problem

The /optimize system prompt was producing verbose trailing prose after each SQL suggestion:

1. CREATE INDEX CONCURRENTLY idx ON workload(val) -- This will allow the query to use an index scan instead of a sequential scan, significantly reducing execution time.
2. VACUUM workload -- This will reclaim space from dead tuples, improving overall performance.

Fix

Tightened the system prompt in src/repl/ai_commands.rs (handle_ai_optimize):

  • Added concrete format example with semicolons and short inline comments
  • Added explicit rule: inline comment max 60 chars, no trailing sentences
  • Added forbidden phrases list: "This will", "This allows", "This enables", "This should", "significantly", "overall"

Before (old prompt rules)

Format each item as: N. <action> -- <one-line rationale with expected impact>
Rules:
- CREATE INDEX items must be valid SQL: CREATE INDEX CONCURRENTLY name ON table(col);
- Maximum 6 items
- No prose intro, no section headers, no markdown
- Include ALL significant findings — just as numbered action items
- Order by expected impact (highest first)

After (new prompt rules)

Format each item as: N. <SQL action>; -- <reason, expected gain>
Example:
1. CREATE INDEX CONCURRENTLY idx ON workload(val); -- seq scan -> index scan, ~10x speedup
2. VACUUM workload; -- reclaim dead tuples, reduce scan cost
3. ANALYZE workload; -- refresh planner stats
Rules:
- Each item is a single SQL statement followed by an inline comment (-- ...)
- Inline comment max 60 chars; no trailing sentences after the SQL line
- CREATE INDEX items must be valid SQL: CREATE INDEX CONCURRENTLY name ON table(col);
- Maximum 6 items
- No prose intro, no section headers, no markdown, no standalone sentences
- Forbidden phrases: "This will", "This allows", "This enables", "This should", "significantly", "overall"
- Order by expected impact (highest first)

Live terminal output (ashtest DB, gpt-4o-mini)

ashtest=# /optimize SELECT * FROM workload WHERE val = 'foo'
Seq Scan on public.workload  (cost=0.00..2363.00 rows=1 width=41) (actual time=27.839..27.839 rows=0 loops=1)
  Output: id, val
  Filter: (workload.val = 'foo'::text)
  Rows Removed by Filter: 100000
  Buffers: shared hit=1113
Planning:
  Buffers: shared hit=29
Planning Time: 0.199 ms
Execution Time: 27.856 ms

1. CREATE INDEX CONCURRENTLY idx_val ON workload(val); -- seq scan -> index scan, ~10x speedup
2. VACUUM public.workload; -- reclaim dead tuples, reduce scan cost
3. ANALYZE public.workload; -- refresh planner stats
ashtest=#

Verified: output contains none of This will, This allows, This enables, This should, significantly, overall.

Checks

  • cargo build
  • cargo clippy --all-targets --all-features -- -D warnings ✓ (no errors)
  • cargo fmt -- --check
  • /explain prompt reviewed — already uses clean structured labeled-line format, no changes needed

@NikolayS NikolayS merged commit a0e8125 into main Mar 26, 2026
16 checks passed
@NikolayS NikolayS deleted the fix/742-ai-compact-output branch March 26, 2026 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(ai): compact terminal output for /explain and /optimize

2 participants