Skip to content

fix(connection): surface real IO error instead of 'db error' on connect failure#708

Merged
NikolayS merged 3 commits intomainfrom
fix/707-connection-error-message
Mar 24, 2026
Merged

fix(connection): surface real IO error instead of 'db error' on connect failure#708
NikolayS merged 3 commits intomainfrom
fix/707-connection-error-message

Conversation

@NikolayS
Copy link
Copy Markdown
Owner

Fixes #707

Problem

Users got connection failed: db error with no useful information. Reported by @mlusetti on X.

Root cause

tokio_postgres::Error::to_string() returns the opaque string "db error" / "error connecting to server" when the error is not a DB-level error. The actual IO error (TCP refused, DNS failure, etc.) is buried in the source() chain.

Fix

Walk the std::error::Error::source() chain in map_connect_error() to find the deepest meaningful message.

Before / After

# Wrong port:
Before: connection to server at "localhost", port 9999 failed: db error
After:  connection to server at "localhost", port 9999 failed: Connection refused (os error 111)

# Wrong hostname:
Before: connection to server at "bad.host", port 5432 failed: db error
After:  connection to server at "bad.host", port 5432 failed: failed to lookup address information: Name or service not known

…ct failure

Walk the tokio-postgres error source chain to find the actual underlying
cause (IO error, DNS failure, etc.) when the top-level message is the
opaque 'db error' / 'error connecting to server' placeholder.

Before: connection to server at "localhost", port 9999 failed: db error
After:  connection to server at "localhost", port 9999 failed: Connection refused (os error 111)

Fixes #707
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 22, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 68.84%. Comparing base (debbf31) to head (6d103d4).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/connection.rs 94.74% 1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #708      +/-   ##
==========================================
+ Coverage   68.83%   68.84%   +0.01%     
==========================================
  Files          46       46              
  Lines       30852    30870      +18     
==========================================
+ Hits        21234    21251      +17     
- Misses       9618     9619       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@NikolayS NikolayS mentioned this pull request Mar 24, 2026
9 tasks
@NikolayS
Copy link
Copy Markdown
Owner Author

REV Review — PR #708: fix(connection): surface real IO error

Summary

map_connect_error() now walks the std::error::Error::source() chain when the top-level message is one of three known opaque strings, surfacing the real OS/network error for better diagnostics. The approach is correct and safe; a few minor behavioural edge cases are worth noting.

Blocking findings

None.

Non-blocking findings

src/connection.rs — source-chain walk always takes the deepest non-generic error

The loop updates found on every non-generic source, so it returns the deepest informative message rather than the first one:

while let Some(src) = cause {
    let s = src.to_string();
    if !s.is_empty() && s != "db error" && … {
        found = s;          // keeps overwriting — deepest wins
    }
    cause = src.source();
}

In tokio-postgres error chains the ordering is typically:
"db error"DbError { message: "password authentication failed" }<io::Error>

Taking the deepest node works well for OS errors but could strip away a more descriptive mid-chain message if the chain is longer than two levels. For authentication failures in particular, the postgres DbError message is the most useful node — an io::Error at the end of the chain would be less useful. The current classification logic (msg.contains("authentication")) still fires first, so auth errors are safe, but the behaviour is subtly different from "first informative" and should be documented.

Suggestion: if the intended semantics are "deepest wins", add a comment. If "first informative" is intended, use break after found = s.

src/connection.rs — hardcoded sentinel strings are fragile

The three sentinel strings ("db error", "error communicating with the server", "error connecting to server") are tokio-postgres internals. They are not part of its public API and could change between patch releases. A single source-of-truth constant (or a private helper) would make future maintenance easier and centralise the place that needs updating when tokio-postgres changes those strings.

CLAUDE.md — security rules added without HISTORY entry

The new ## Security rules section in CLAUDE.md is good policy, but adds no changelog note or date stamp. Minor, but worth tracking when these rules were introduced.

Potential issues

Infinite loop if a source chain is circular

std::error::Error::source() is a user-implementable trait. A third-party wrapper that accidentally forms a cycle (A.source() → B, B.source() → A) would spin the while loop forever. This is extremely unlikely in practice given the concrete types involved, but a depth limit (e.g. 10 iterations) costs nothing and is defensive:

let mut depth = 0usize;
while let Some(src) = cause {
    if depth > 10 { break; }
    depth += 1;}

"error communicating with the server" sentinel scope

"error communicating with the server" is also checked in the inner loop as a non-generic filter, which means if a source of an error has this message it is skipped too. That seems intentional — but if tokio-postgres ever uses this string for a real connection diagnostic it would be silently elided. Low probability, worth a comment.

Test coverage

No tests were added for the changed logic in map_connect_error(). The function is private, so it cannot be called directly from the test module, but it is reachable through the existing connect_plain / connect_tls_with_config call sites.

Missing test scenarios:

  1. Unit test via a mock error chain: construct a tokio_postgres::Error whose to_string() is "db error" and whose source() chain contains a real io::Error (e.g., Connection refused). Assert that the resulting ConnectionError contains the IO error message. The existing test suite has no coverage for map_connect_error() at all.

  2. Sentinel pass-through: verify that an error whose top-level message is not one of the three sentinels is returned unchanged (i.e. the new branch is not entered).

  3. All-generic chain: verify that when every node in the chain matches a sentinel, found falls back to the raw top-level message (current behaviour).

  4. "error communicating with the server" sentinel: the new code adds two sentinels in addition to the original "db error" — neither has a test.

Overall: test coverage for the new logic is absent. Since the function is internal and the existing tests are integration-style (requiring a real or mock Postgres), a unit-test helper that accepts a Box<dyn std::error::Error> and returns the mapped message would make this testable without a live database.

Severity: MEDIUM — the new code path is exercised only in production.

Verdict

APPROVE

The fix is correct, minimal, and addresses a real UX pain point (opaque "db error" messages). The non-blocking findings are worth addressing in a follow-up: add a depth guard, document the "deepest wins" semantics, and add at least one unit test for the source-chain walk. No blocking issues.

@NikolayS NikolayS merged commit 389b876 into main Mar 24, 2026
16 checks passed
@NikolayS NikolayS deleted the fix/707-connection-error-message branch March 24, 2026 01:12
@mlusetti
Copy link
Copy Markdown

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(connection): 'connection failed: db error' swallows actual error cause

3 participants