Skip to content

feat(connection): multi-host failover#751

Merged
NikolayS merged 6 commits intomainfrom
fix/743-multihost
Mar 28, 2026
Merged

feat(connection): multi-host failover#751
NikolayS merged 6 commits intomainfrom
fix/743-multihost

Conversation

@NikolayS
Copy link
Copy Markdown
Owner

Closes #743

Implements comma-separated multi-host connection with failover, matching psql/libpq semantics.

  • rpg -h host1,host2 -p 5432 tries hosts in order, uses first that accepts connection
  • Per-host ports supported: -h host1,host2 -p 5432,5433
  • Port reuse when fewer ports than hosts: last port is reused (matching libpq)
  • URI and connstring forms already worked; CLI flags now work too
  • -p flag accepts comma-separated port list (single port still works — fully backward-compatible)
  • resolve_hosts() extended with priority 3: CLI multi-host (after URI and conninfo)

Tests

5 new unit tests in src/connection.rs:

  • test_resolve_params_cli_multihost_single_port — single port applied to all hosts
  • test_resolve_params_cli_multihost_per_host_ports — per-host port assignment
  • test_resolve_params_cli_multihost_port_reuse — last port reused when fewer ports than hosts
  • test_resolve_params_cli_multihost_positional — positional HOST arg with commas
  • test_resolve_params_cli_single_host_regression — single-host path unchanged

3 new integration tests in tests/connection_paths.rs (Group I, run via connection-tests CI):

  • i1_multihost_first_fails_second_succeeds — failover to second host
  • i2_multihost_all_fail — error message when all hosts unreachable
  • i3_single_host_regression — single host still connects normally

Existing f7_multihost_failover canary test (issue #724) now exercises the same code path.

All 1814 unit tests pass.

@NikolayS
Copy link
Copy Markdown
Owner Author

REV Review — fix/743-multihost

APPROVE

Clean implementation. The approach is correct and minimal.

What's good

  • port_str: Option<String> added to CliConnOpts — carries raw comma-separated ports from CLI before they're parsed to u16
  • CLI multi-host added as priority 3 in resolve_hosts(), after URI and conninfo — correct precedence
  • Last-port-reuse semantics match libpq when fewer ports than hosts are given
  • Single call site for resolve_hosts — no risk of breaking other paths
  • 5 unit tests cover all port scenarios: single port, per-host ports, port reuse, positional host, single-host regression
  • CliConnOpts derives Default so all existing construction sites (all use ..Default::default()) get port_str: None automatically

Notes

  • Existing failover loop in connect() already consumed params.hosts — this PR just wires up the CLI input path that was missing. Good surgical change.
  • Integration tests (i1/i2/i3) are #[ignore] pending live Postgres — correct for now.

No SOC2 items.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 25, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 96.99248% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.52%. Comparing base (4b95bc5) to head (382fc91).

Files with missing lines Patch % Lines
src/main.rs 77.78% 2 Missing ⚠️
src/connection.rs 99.19% 1 Missing ⚠️
src/session.rs 0.00% 1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #751      +/-   ##
==========================================
+ Coverage   67.40%   67.52%   +0.13%     
==========================================
  Files          52       52              
  Lines       33667    33792     +125     
==========================================
+ Hits        22690    22817     +127     
+ Misses      10977    10975       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@NikolayS
Copy link
Copy Markdown
Owner Author

Code Review: PR #751 — feat(connection): multi-host failover

Repo: NikolayS/rpg
PR: #751
Reviewer: Max
Date: 2026-03-25


Summary

Adds CLI-level multi-host support (-h h1,h2 -p 5432,5433) to complement the existing URI and conninfo paths. The approach is functionally correct for the happy path but has several issues worth addressing before merge.


BLOCKING

B1 — Priority inversion: CLI multi-host fires after conninfo single-host (confidence: 8)

The priority comment says:

1. URI multi-host
2. conninfo multi-host
3. CLI multi-host  ← new
4. single-host fallback

But the actual code does not enforce this. Both URI and conninfo blocks return early when they find multi-host data — that's correct. However, the CLI block runs after the conninfo block only when the conninfo block did not return early (i.e., conninfo had only a single host). That part is fine.

The real bug: if a user passes -h h1,h2 on the CLI and also passes a conninfo string with host=single, the conninfo block does not return early (single host), so the CLI block fires and overrides conninfo. That matches the documented priority — but the comment says conninfo should win (#2 > #3). In the real world, conninfo and CLI flags are often combined and the user probably expects the explicit -h to win. However, the docstring says otherwise.

Either the priority list is wrong, or the implementation is wrong. They need to be consistent. Pick one and document it clearly.


B2 — Silent port parse failure: invalid port strings are silently dropped (confidence: 9)

let port_parts: Vec<u16> = raw_ports
    .split(',')
    .filter_map(|s| s.trim().parse::<u16>().ok())
    .collect();

filter_map(...ok()) silently discards parse failures. If the user types rpg -h h1,h2 -p 5432,abc, the second port is silently dropped. The port assignment loop then carries forward last_port (the default), which is almost certainly wrong and will produce a misleading connection to the wrong port without any error.

Fix: Collect into Result<Vec<u16>, _> and surface a ConnectionError on any parse failure.


B3 — port_pos is a String but was u16 before — type inconsistency risk (confidence: 7)

The diff mentions port_pos is referenced via .as_deref():

.or(o.port_pos.as_deref())

If port_pos was changed from Option<u16> to Option<String> in main.rs (as the description states for the -p flag), that change must be consistent across all callers including session.rs, config fallback, and any profile/config parsing code. If any path still stores a u16 and the new code tries to as_deref() it, that's a compile error — but if the field was renamed inconsistently or a duplicate field exists, the old field silently takes a wrong value.

Action: Verify port_pos type is consistently Option<String> everywhere and all prior u16 parse sites now validate properly.


NON-BLOCKING

N1 — last_port carry-forward semantics need a comment (confidence: 6)

let mut last_port = default_port;
for (i, host_part) in host_parts.iter().enumerate() {
    if let Some(&p) = port_parts.get(i) {
        last_port = p;
    }
    host_list.push(((*host_part).to_owned(), last_port));
}

The "last port carries forward" behavior (matching libpq semantics for host=h1,h2,h3 port=5432,5433) is correct but non-obvious. A single-line comment explaining this is intentional and matches libpq/PostgreSQL conninfo behavior would prevent future "fix" PRs that break it.


N2 — host_parts whitespace trimming but not host_part validation (confidence: 5)

str::trim is applied during split, which is good. But there's no check for empty strings resulting from inputs like -h "h1,,h2" or -h "h1, ,h2". An empty host string will be pushed into host_list and will eventually produce a confusing connection error at the driver level rather than a clean validation error here.

Suggestion: Filter out empty host_part values and return an error if any are found, or at minimum skip them.


N3 — Backward-compat mutation is surprising (confidence: 5)

// Update params.host/port to reflect first entry (backward-compat).
params.host.clone_from(&host_list[0].0);
params.port = host_list[0].1;
params.hosts = host_list;

Mutating params.host and params.port to mirror hosts[0] is a pragmatic choice, but it creates subtle coupling: any code that reads params.host after resolve_hosts returns will see a different value depending on whether multi-host was used. This is fine if callers always use params.hosts, but worth a comment that host/port are kept in sync with hosts[0] as a compat shim and shouldn't be used as the authoritative source when hosts.len() > 1.


N4 — Integration tests are all #[ignore] (confidence: 7)

Three integration tests in tests/connection_paths.rs are marked #[ignore]. This is presumably because they need a live Postgres instance. That's understandable, but it means this feature ships with zero automatically-exercised integration coverage. The 5 unit tests presumably cover parsing only.

Consider:

  • Adding a note in the test file explaining what env vars/setup is needed to run them
  • Adding at least one unit test that exercises the full resolve_params path with a mocked CliConnOpts (no live DB needed) to catch regressions in the priority logic

N5 — (*host_part).to_owned() is needlessly verbose (confidence: 4)

host_list.push(((*host_part).to_owned(), last_port));

This can be written as host_part.to_string() or (*host_part).to_string() — minor style nit, but (*host_part).to_owned() is the idiom for &str → String that most Rustaceans write as just host_part.to_string() since host_part: &&str auto-derefs.


POTENTIAL

P1 — Port 0 is technically valid but probably wrong (confidence: 4)

u16 parse succeeds for 0. Port 0 in a connection string means "OS picks the port" in some contexts but is almost never intentional for Postgres. If someone passes -p 0,5432, port 0 silently goes in. Could add a > 0 validation alongside the parse. Low severity but worth considering.


P2 — host_list[0] indexing assumes non-empty host_parts (confidence: 5)

After the if host_parts.len() > 1 guard, we know there are at least 2 elements, so host_list[0] is safe after the loop. But this relies on the loop always pushing at least one entry given non-empty host_parts. If host_parts is somehow empty after the guard (can't happen with current code but if the guard changes), this panics. A .first() with a proper error would be more defensive.

Currently safe given the > 1 guard, but flagging for awareness.


Test Coverage Assessment

  • 5 unit tests: likely cover port parsing and host splitting. Good baseline.
  • 3 integration tests: all #[ignore] — effectively absent in CI.
  • Missing: test for priority ordering (URI/conninfo beats CLI), test for invalid port string (B2), test for empty host segment (N2), test for single port applying to all hosts.

Verdict

B2 (silent port parse failure) is the most important fix — it's a real user-facing bug with no error feedback. B1 (priority inversion ambiguity) needs at minimum a documentation fix, and ideally a deliberate decision on which wins. Everything else is polish.

@NikolayS
Copy link
Copy Markdown
Owner Author

Fixed REV blocking findings:

  • B1: priority comment clarified — CLI -h wins over single-host conninfo (documented)
  • B2: invalid port in -p now emits a warning instead of silently using wrong port; added guard test

All tests pass.

@NikolayS
Copy link
Copy Markdown
Owner Author

REV Code Review Report


BLOCKING ISSUES (1)

MEDIUM src/connection.rs:893-897 - Early return on invalid port leaves params.hosts stale

When -p contains an invalid port token (e.g. -h h1,h2 -p 5432,notaport), port_parse_ok is set to false and the function returns early with return. The comment says "Fall through to the single-host path", but return exits resolve_hosts entirely — it does not reach the fallback params.hosts = vec![(params.host.clone(), params.port)] at line 919.

At the point of the early return, params.hosts still holds the value from ConnParams::default() (i.e., the OS socket-based default host/port from default_host_port()), not the resolved params.host/params.port. So the caller ends up with a mismatch: params.host = resolved host, params.port = resolved port, but params.hosts = stale default. Depending on which field the connection layer actually iterates, this can silently connect to the wrong host.

The test test_resolve_params_cli_multihost_invalid_port_does_not_silently_use_wrong_port only asserts port != 0, which is satisfied by the stale default (e.g. 5432), not detecting the mismatch.

Fix: Replace return; with the fallback assignment before returning:

if !port_parse_ok {
    // Fall through to single-host path.
    params.hosts = vec![(params.host.clone(), params.port)];
    return;
}

Also strengthen the invalid-port test to assert that params.hosts[0].0 == "h1" and params.hosts[0].1 == 5432 (the resolved values), not just that port ≠ 0.


NON-BLOCKING (2)

LOW src/connection.rs:872-874 - port_str vs port_pos precedence may shadow per-host ports in positional form

For positional args (no -h/-p flags), port_pos carries the raw port string. The code at line 872 picks port_str first, then falls back to port_pos. However, port_str is populated from the named -p flag (see main.rs:329), while port_pos holds the positional PORT argument. If a user types rpg h1,h2 7777,7778 (positional form), o.port_str will be None and o.port_pos will be Some("7777,7778"), which is correct. The test test_resolve_params_cli_multihost_positional confirms this path works. No active bug, but the fallback chain is non-obvious. A clarifying comment at the fallback point would help future maintainers.

LOW src/main.rs:320-322 - Silent parse failure for first port in -p

conn_opts() extracts port_u16 by splitting on , and parsing the first token:

let port_u16: Option<u16> = self.port.as_deref()
    .and_then(|s| s.split(',').next())
    .and_then(|s| s.trim().parse().ok());

If the first token is not a valid u16 (e.g. -p notaport,5432), port_u16 becomes None silently. The error surfaces later as a confusing "could not connect" rather than "invalid port". Consider emitting a user-facing warning or validation error at this layer, consistent with the explicit eprintln! guard in resolve_hosts.


POTENTIAL ISSUES (2)

MEDIUM src/connection.rs:860-861 - Named -h and positional HOST are treated as equivalent for multi-host; priority vs conninfo is not tested (confidence: 6/10)

The comment at line 808-811 documents that CLI -h takes priority over single-host conninfo. However, there is no unit test covering the scenario where both opts.host (CLI -h host1,host2) and a single-host conninfo are present simultaneously. If a user has PGHOST=dbserver in a .pg_service.conf and passes -h h1,h2, the intent is clearly CLI wins — but without a test this priority path is unverified.

LOW tests/connection_paths.rs:1513-1625 - Integration tests i1/i2 use a hard-coded non-routable hostname, which may resolve differently under different DNS configs (confidence: 5/10)

bad-host-xyz-nonexistent is used as the unreachable first host. On some corporate or CI environments with wildcard DNS, this may resolve to a real IP. The test would then fail for a different reason (connection refused vs. name resolution failure). Using 192.0.2.1 (TEST-NET, RFC 5737) as the unreachable address is more reliable for tests that need a guaranteed-unreachable host.


Summary

Area Findings Potential Filtered
Security 0 0 0
Bugs 1 1 0
Tests 0 1 0
Guidelines 0 0 0
Docs 1 0 0

Notes:

  • Findings: High-confidence issues (8-10/10) — blocking or non-blocking per severity
  • Potential: Medium-confidence issues (4-7/10) — review manually
  • Filtered: Low-confidence issues (0-3/10) — excluded as likely false positives

REV-assisted review (AI analysis by postgres-ai/rev)

1 similar comment
@NikolayS
Copy link
Copy Markdown
Owner Author

REV Code Review Report


BLOCKING ISSUES (1)

MEDIUM src/connection.rs:893-897 - Early return on invalid port leaves params.hosts stale

When -p contains an invalid port token (e.g. -h h1,h2 -p 5432,notaport), port_parse_ok is set to false and the function returns early with return. The comment says "Fall through to the single-host path", but return exits resolve_hosts entirely — it does not reach the fallback params.hosts = vec![(params.host.clone(), params.port)] at line 919.

At the point of the early return, params.hosts still holds the value from ConnParams::default() (i.e., the OS socket-based default host/port from default_host_port()), not the resolved params.host/params.port. So the caller ends up with a mismatch: params.host = resolved host, params.port = resolved port, but params.hosts = stale default. Depending on which field the connection layer actually iterates, this can silently connect to the wrong host.

The test test_resolve_params_cli_multihost_invalid_port_does_not_silently_use_wrong_port only asserts port != 0, which is satisfied by the stale default (e.g. 5432), not detecting the mismatch.

Fix: Replace return; with the fallback assignment before returning:

if !port_parse_ok {
    // Fall through to single-host path.
    params.hosts = vec![(params.host.clone(), params.port)];
    return;
}

Also strengthen the invalid-port test to assert that params.hosts[0].0 == "h1" and params.hosts[0].1 == 5432 (the resolved values), not just that port ≠ 0.


NON-BLOCKING (2)

LOW src/connection.rs:872-874 - port_str vs port_pos precedence may shadow per-host ports in positional form

For positional args (no -h/-p flags), port_pos carries the raw port string. The code at line 872 picks port_str first, then falls back to port_pos. However, port_str is populated from the named -p flag (see main.rs:329), while port_pos holds the positional PORT argument. If a user types rpg h1,h2 7777,7778 (positional form), o.port_str will be None and o.port_pos will be Some("7777,7778"), which is correct. The test test_resolve_params_cli_multihost_positional confirms this path works. No active bug, but the fallback chain is non-obvious. A clarifying comment at the fallback point would help future maintainers.

LOW src/main.rs:320-322 - Silent parse failure for first port in -p

conn_opts() extracts port_u16 by splitting on , and parsing the first token:

let port_u16: Option<u16> = self.port.as_deref()
    .and_then(|s| s.split(',').next())
    .and_then(|s| s.trim().parse().ok());

If the first token is not a valid u16 (e.g. -p notaport,5432), port_u16 becomes None silently. The error surfaces later as a confusing "could not connect" rather than "invalid port". Consider emitting a user-facing warning or validation error at this layer, consistent with the explicit eprintln! guard in resolve_hosts.


POTENTIAL ISSUES (2)

MEDIUM src/connection.rs:860-861 - Named -h and positional HOST are treated as equivalent for multi-host; priority vs conninfo is not tested (confidence: 6/10)

The comment at line 808-811 documents that CLI -h takes priority over single-host conninfo. However, there is no unit test covering the scenario where both opts.host (CLI -h host1,host2) and a single-host conninfo are present simultaneously. If a user has PGHOST=dbserver in a .pg_service.conf and passes -h h1,h2, the intent is clearly CLI wins — but without a test this priority path is unverified.

LOW tests/connection_paths.rs:1513-1625 - Integration tests i1/i2 use a hard-coded non-routable hostname, which may resolve differently under different DNS configs (confidence: 5/10)

bad-host-xyz-nonexistent is used as the unreachable first host. On some corporate or CI environments with wildcard DNS, this may resolve to a real IP. The test would then fail for a different reason (connection refused vs. name resolution failure). Using 192.0.2.1 (TEST-NET, RFC 5737) as the unreachable address is more reliable for tests that need a guaranteed-unreachable host.


Summary

Area Findings Potential Filtered
Security 0 0 0
Bugs 1 1 0
Tests 0 1 0
Guidelines 0 0 0
Docs 1 0 0

Notes:

  • Findings: High-confidence issues (8-10/10) — blocking or non-blocking per severity
  • Potential: Medium-confidence issues (4-7/10) — review manually
  • Filtered: Low-confidence issues (0-3/10) — excluded as likely false positives

REV-assisted review (AI analysis by postgres-ai/rev)

@NikolayS
Copy link
Copy Markdown
Owner Author

REV Code Review — PR #751

PR: #751 — feat(connection): multi-host failover
Branch: fix/743-multihost
AI-Assisted: Yes


BLOCKING ISSUES

HIGH src/connection.rs:870params.host.clone_from(&host_list[0].0) — unchecked index on host_list (confidence: 9/10)

host_list is populated only when host_parts.len() > 1, so it has at least 2 elements at the point host_list[0] is accessed. This is safe — the guard is host_parts.len() > 1 at line 858. However, this implicit invariant is fragile: if the guard is ever relaxed (e.g., >= 1), host_list[0] will panic on empty input. Replace with a host_list.first() pattern and an explicit if host_list.is_empty() { return; } guard to make the invariant explicit and panic-safe.

Fix:

if let Some(first) = host_list.first() {
    params.host.clone_from(&first.0);
    params.port = first.1;
}
params.hosts = host_list;

NON-BLOCKING

LOW src/main.rs:321port_u16 parses only the first token of -p for backward-compat callers

The comment explains this correctly. The only concern is that if -p 5432,notaport is supplied, port_u16 is Some(5432) while port_str is "5432,notaport" — the invalid-port detection in resolve_hosts handles the mismatch. This is correct behavior.

LOW src/connection.rs:854–858cli_host prefers o.host over o.host_pos without a comment explaining priority

Document: o.host is the named --host flag; o.host_pos is the positional HOST argument. Named flag takes priority, consistent with single-host resolution elsewhere.

LOW src/connection.rs:804–806 — doc comment for resolve_hosts priority list is now 4 items; previously 3

The updated comment correctly documents all 4 priority tiers. Minor: item 3 says "CLI -h flag (takes priority over single-host conninfo)" — this is only true when -h supplies multiple hosts. Clarify: "Multi-host from CLI -h flag when multiple hosts specified".


POTENTIAL ISSUES

MEDIUM src/connection.rs:880–883 — port reuse uses last_port (last seen port) for subsequent hosts

When fewer ports than hosts are provided (e.g., -p 5432,5433 with 3 hosts), host 3 gets port 5433 (last seen). This matches PostgreSQL libpq behavior and the test test_resolve_params_cli_multihost_port_reuse validates it. Confirm this is intentional and matches libpq semantics — it does. No action needed beyond the existing test.

MEDIUM tests/connection_paths.rs — I1/I2/I3 integration tests are all #[ignore] (confidence: 7/10)

The tests are correctly ignored for standard CI (they require a live Postgres). Confirm there is a connection-tests CI job that runs them with cargo test -- --ignored; if not, these tests will never run automatically and are effectively dead code.


Summary

Area Findings Potential Filtered
Security 0 0 0
Bugs 1 0 0
Tests 0 1 0
Guidelines 0 2 0
Docs 1 0 0

The core logic is solid — URI and conninfo multi-host paths are unchanged; the new CLI path is additive and well-guarded. The port_str design cleanly preserves backward compatibility. Unit test coverage is comprehensive (6 new tests). The BLOCKING finding is a low-risk latent panic: the current guard makes it unreachable, but it should be made explicit.

CI is currently red (cargo fmt only — fixed in latest push). Re-run expected to pass.


REV-assisted review (AI analysis by postgres-ai/rev)

@NikolayS
Copy link
Copy Markdown
Owner Author

REV

PR #751 — feat(connection): multi-host failover

Summary: Adds CLI-level multi-host support (-h h1,h2 -p 5432,5433). Changes -p from u16 to String, introduces port_str field on CliConnOpts, and threads opts through resolve_hosts. Good test coverage: single port broadcast, per-host ports, port reuse (last port sticky), positional HOST, and invalid-port guard.

Observations:

  1. CliConnOpts::port semantic shiftport is now the parsed-first-token value while port_str holds the raw input. That's workable but fragile: callers that set port directly (e.g. tests bypassing conn_opts()) will silently skip the multi-host path unless they also set port_str. Worth a doc comment on the struct clarifying the invariant.

  2. port_pos dual-useport_pos is read by resolve_hosts as the raw multi-host port string, but it's also used elsewhere as a parsed port. That implicit coupling is a footgun; a dedicated port_pos_str alongside port_str would be cleaner long-term.

  3. Error recovery on invalid port — The fallback to (params.host.clone(), params.port) is correct and the test covers it. Consider eprintln! message clarity: it says "in -p" but positional ports go through port_pos, so the message can mislead. Minor.

  4. No TargetSessionAttrs + multi-host integration testtarget_session_attrs=read-write is the main reason to give multiple hosts; worth a comment noting that the failover loop (if any) lives elsewhere and this PR only handles parsing.

  5. last_port reuse semantics match libpq — confirmed correct per libpq docs. Good.

Verdict: Logic is sound, tests are thorough. Points 1–2 are worth addressing before or shortly after merge. No blockers.

@NikolayS
Copy link
Copy Markdown
Owner Author

REV

#751 — feat(connection): multi-host failover

This is the most consequential PR of the batch. Connection handling is a correctness-critical path. The implementation is generally sound but has a few things worth checking carefully.

What's good

  • port_str: Option<String> as a raw field alongside the parsed port: Option<u16> is the right way to handle comma-separated port lists at the CLI layer — clean separation of concerns
  • Port reuse for remaining hosts (last port carries forward) matches libpq behavior
  • Invalid port early-return path is defensive and correct: fail loudly rather than silently misassign ports
  • Test coverage is thorough: single port, per-host ports, port reuse, positional HOST argument, invalid port

Concerns

  1. CLI multi-host priority vs conninfo — the doc comment says "CLI -h takes priority over single-host conninfo" which is the intended behavior. But what happens when -h h1,h2 is given AND the conninfo string also has multi-host (host=h1,h2)? The conninfo path runs first and populates params.hosts, then the CLI path overwrites it. Is that the right precedence? A test for this case would confirm the behavior is intentional.

  2. eprintln! for invalid port — printing to stderr from deep in resolve_hosts is unexpected for a library-style function. This should either return an error (propagate up) or use the logging subsystem. The current approach makes it hard to test and breaks the no-side-effects contract of a resolver function.

  3. port_parse_ok = false but port_parts is still partially populated — when a port is invalid, port_parse_ok is set false and the function falls back to single-host. But the loop continues collecting (with 0 as the invalid value). This is fine since the result is discarded, but the code is confusing to read. A cleaner approach: early-return from the closure on first failure.

  4. IPv6 host supporth.split(',').map(str::trim) will split on commas inside IPv6 bracket notation ([::1]). Multi-host IPv6 via CLI is probably rare but worth a note in the code or a test asserting the behavior ("IPv6 hosts must use URI format for multi-host").

  5. params.hosts = vec![(params.host.clone(), params.port)] — the fallback on invalid port sets hosts to the already-resolved single host. But if the user supplied -h h1,h2 and an invalid port, they clearly intended multi-host. Silently falling back to single-host could cause a confusing connection to just h1 when the intent was failover. At minimum the error message should make clear that only h1 will be used.

Minor

  • Doc update in connection.rs header: "Multiple hosts may be specified via:" with CLI flags documented — good addition.
  • The resolve_hosts function signature now takes opts: Option<&CliConnOpts> — Option wrapping is correct since this is also called from tests that don't have CLI opts.

Verdict: approve with caveats. Issue 2 (eprintln in resolver) is worth cleaning up. Issue 4 (IPv6) should be documented. The rest are acceptable for now.

@NikolayS
Copy link
Copy Markdown
Owner Author

REV Code Review — PR #751: feat(connection): multi-host failover

CI: green | Diff scope: src/connection.rs, src/main.rs, src/session.rs, tests/connection_paths.rs


Blocking findings

None.


Non-blocking findings

[BUG / LOW] port_parse_ok flag is set in a closure but the fallback populates params.hosts with a potentially stale host

In resolve_hosts, when an invalid port is detected the fallback is:

params.hosts = vec![(params.host.clone(), params.port)];
return;

params.host at this point reflects the single-host resolution that happened before resolve_hosts was called, so it should be correct. However, the test test_resolve_params_cli_multihost_invalid_port_does_not_silently_use_wrong_port passes -h h1,h2 but the expected fallback host in the assertion is params.host.clone() — which could resolve to an empty string if the env guard clears PGHOST and no default is applied. Worth adding a concrete assert_ne!(params.host, "") assertion or using a known literal.

[STYLE / LOW] port_parse_ok bool mutated from inside map closure

The port_parse_ok mutation via unwrap_or_else inside a closure is non-idiomatic Rust and surprising to read. A cleaner pattern is to collect Result<u16, _> values and check with .collect::<Result<Vec<_>, _>>():

let port_parts: Result<Vec<u16>, _> = raw_ports
    .split(',')
    .filter(|s| !s.trim().is_empty())
    .map(|s| s.trim().parse::<u16>())
    .collect();

This avoids the side-effect-in-closure pattern and makes the error path clearer.

[DOCS / LOW] CliConnOpts::port_str field doc could clarify "takes precedence"

The comment says Takes precedence over port when parsing multi-host lists. — this is slightly confusing because port_str is the raw value of the same CLI -p flag; port is just the already-parsed first token. Suggest: "Raw -p string; preserves comma-separated tokens for multi-host port assignment in resolve_hosts."


Potential issues (low confidence)

[BEHAVIOR] Comma in host string is not validated at the single-host path

If a user passes -h "h1,h2" without a comma in -p, the code correctly enters the multi-host path. But if resolve_hosts receives opts where cli_host contains a comma while we also have a conninfo host=single, the conninfo path runs first and sets params.hosts before the CLI path is reached. The CLI multi-host block then returns after overwriting. This seems intentional (CLI > conninfo for multi-host), but it is not tested and the comment says "CLI takes priority over single-host conninfo" without a matching test for the mixed case.


Summary

Solid implementation. The port_str/port dual-field approach is the right call for backward compatibility. The invalid-port guard is a nice safety net. Test coverage is thorough for the happy paths. The style note on port_parse_ok mutation is the most worth addressing, but none of the findings are blockers.

@NikolayS
Copy link
Copy Markdown
Owner Author

REV Re-Review — PR #751 (fix/743-multihost)

Branch: fix/743-multihost | Head: cfd7295 | Commits since last review: 2


Changes reviewed in this pass

Two commits since the last review (cfd7295, 17bcbbc):

  1. fix(connection): populate hosts from single-host fallback on invalid port — fixes the stale params.hosts bug from the previous review
  2. fix(connection): use first() instead of index for panic safety — replaces host_list[0] indexing with .first() in both resolve_hosts and parse_uri

BLOCKING

None.


NON-BLOCKING

[BUG / LOW] port_parse_ok mutation inside map closure is non-idiomatic and makes invalid-port detection fragile

src/connection.rs:881-893

The port_parse_ok flag is mutated as a side effect inside a map() closure. This compiles and works correctly today, but it obscures the failure path and is surprising to read. The idiomatic Rust pattern is to collect into Result:

let port_parts: Result<Vec<u16>, _> = raw_ports
    .split(',')
    .filter(|s| !s.trim().is_empty())
    .map(|s| s.trim().parse::<u16>())
    .collect();

match port_parts {
    Ok(parts) => { /* use parts */ }
    Err(_) => {
        eprintln!("rpg: invalid port value in -p '{}'", raw_ports);
        params.hosts = vec![(params.host.clone(), params.port)];
        return;
    }
}

This also improves the error message: currently it prints the specific bad token but the eprintln! call happens inside unwrap_or_else, which means the invalid value is printed but port_parse_ok = false is still evaluated in the same pass — the behavior is correct, just hard to reason about. No current bug, but a cleanup worth doing.


POTENTIAL

[BEHAVIOR / MEDIUM] No test for CLI multi-host + conninfo single-host combined (confidence: 6/10)

The priority comment at line 808-811 says CLI -h wins over single-host conninfo, and this is the documented/intended behavior. But there is no unit test that simultaneously supplies opts.host = Some("h1,h2") and a conninfo HashMap with host=single. The test test_resolve_params_cli_multihost_single_port does not pass a conninfo, so the priority path is untested. Low risk since the logic falls through correctly, but worth a test.

[DOCS / LOW] eprintln! in resolve_hosts — library-style function with stderr side effect (confidence: 5/10)

resolve_hosts is a pure-ish resolver function. Printing directly to stderr from inside it is surprising for callers that might want to capture/suppress the message. Not a blocker, but consider promoting the diagnostic through a return value or log macro as a longer-term cleanup.


What's fixed

Both blocking findings from the previous review pass are resolved:

  • B: stale params.hosts on invalid port17bcbbc correctly populates params.hosts = vec![(params.host.clone(), params.port)] before returning. The test assertion was also strengthened: it now asserts exact equality against the resolved single-host fallback rather than just port != 0.
  • Latent panic on host_list[0]cfd7295 replaces both host_list[0] calls (in resolve_hosts and parse_uri) with safe .first() patterns. The guard (host_parts.len() > 1) makes them unreachable in current code, but the defensive fix is correct.

Summary

Area Findings Potential Notes
Security 0 0
Bugs 0 1 side-effect-in-closure, low risk
Tests 0 1 CLI+conninfo combined case
Guidelines 0 0
Docs 1 1 port_parse_ok pattern, eprintln placement

LGTM for merge. The core logic is correct and the previous blocking issues are resolved. The non-blocking port_parse_ok pattern is worth a follow-up cleanup but not a merge blocker.


REV-assisted review (AI analysis by postgres-ai/rev)

@NikolayS
Copy link
Copy Markdown
Owner Author

CI is green and REV review found no blocking findings. This PR is ready for merge — awaiting Nik's approval.

@NikolayS
Copy link
Copy Markdown
Owner Author

REV

PR #751 — feat(connection): multi-host failover

Good work. The port_str field addition to CliConnOpts is the right approach — preserving the raw comma-separated string avoids re-parsing the already-parsed u16.

Observations:

  1. Invalid port handling is conservative but correct — on parse failure the code prints a warning and falls back to single-host. The fallback path sets params.hosts = vec![(params.host.clone(), params.port)] using whatever params.host/port were resolved to before resolve_hosts was called. That's the right anchor point. The test test_resolve_params_cli_multihost_invalid_port_does_not_silently_use_wrong_port verifies this; make sure it also asserts on the actual host value, not just that port ≠ 0.

  2. Priority comment in resolve_hosts doc — item 3 says "Multi-host from CLI -h flag (takes priority over single-host conninfo)". This is correct but the interaction is subtle: CLI multi-host applies only when host_parts.len() > 1, so a single CLI -h doesn't override conninfo. The comment could be more precise: "CLI -h with multiple comma-separated hosts takes priority over single-host conninfo".

  3. port_str: Option<String> — the field is added to CliConnOpts with a Default of None. Callers that construct CliConnOpts directly (test code) need to set this. Tests in this PR do it correctly. Just verify no other test-only CliConnOpts construction outside this PR gets confused by the new field.

  4. URI and conninfo multi-host paths are unchanged — confirmed; the new CLI path is additive. URI multi-host still takes first priority.

  5. Test coverage is thorough — single port, per-host ports, port reuse, positional args, invalid port. Good set.

Nothing blocking.

@NikolayS
Copy link
Copy Markdown
Owner Author

Testing evidence — multi-host failover

Tested on cfd7295 (fix/743-multihost branch), built and run against live local Postgres instances.

Environment:

  • 127.0.0.1:15433 — PostgreSQL 17.7, trust auth (samo-test-pg2)
  • 127.0.0.1:15434 — PostgreSQL 17.7, scram-sha-256 auth (rpg-test-pg17)
  • rpg v0.8.3, debug build

Test A: Two valid hosts — connects to first

$ rpg -h 127.0.0.1,127.0.0.1 -p 15433,15434 -U postgres -d ashtest
\conninfo
You are connected to database "ashtest" as user "postgres" on host "127.0.0.1" at port "15433".
SSL connection (protocol: TLS, cipher: unknown, compression: off)

✅ First host wins, as expected.


Test B: First host invalid — failover to second

$ rpg -h invalid.host,127.0.0.1 -p 5432,15433 -U postgres -d ashtest
SELECT inet_server_addr(), inet_server_port();
 inet_server_addr | inet_server_port
------------------+------------------
 172.17.0.2       |             5432
(1 row)

\conninfo
You are connected to database "ashtest" as user "postgres" on host "127.0.0.1" at port "15433".
SSL connection (protocol: TLS, cipher: unknown, compression: off)

invalid.host:5432 fails (DNS/connection error), rpg falls through to 127.0.0.1:15433 and connects successfully. inet_server_port() confirms port 5432 inside the container (mapped externally as 15433). \conninfo shows host "127.0.0.1" at port "15433" — the fallback host was used.


Test C: Single host — backward compat

$ rpg -h 127.0.0.1 -p 15433 -U postgres -d ashtest
SELECT inet_server_addr(), inet_server_port();
 inet_server_addr | inet_server_port
------------------+------------------
 172.17.0.2       |             5432
(1 row)

✅ Single-host path unchanged.


Code path: connect() in src/connection.rs (line ~2354) iterates params.hosts in order, calls connect_one() per host, and continues on error. The first() fix (#743) ensures the legacy params.host/params.port fields are set safely without an index-out-of-bounds panic when the hosts list is populated from URI or CLI flags.

@NikolayS NikolayS force-pushed the fix/743-multihost branch 2 times, most recently from dbe403a to d437864 Compare March 27, 2026 17:08
Agent and others added 5 commits March 27, 2026 21:31
Implements comma-separated multi-host connection with failover,
matching psql/libpq semantics.

- `rpg -h host1,host2 -p 5432` tries hosts in order, uses first
  that accepts a connection
- Per-host ports supported: `-h host1,host2 -p 5432,5433`
- Port reuse: fewer ports than hosts reuses the last port
- URI and connstring forms already worked; now CLI flags work too
- `-p` flag now accepts comma-separated port list (single port still
  works — backward-compatible)
- resolve_hosts() extended with priority 3: CLI multi-host
- 5 new unit tests covering single-port, per-host, port-reuse,
  positional args, and single-host regression
- Integration tests i1/i2/i3 added (run via connection-tests CI job)
@NikolayS NikolayS force-pushed the fix/743-multihost branch from d437864 to db25b59 Compare March 27, 2026 21:31
@NikolayS NikolayS merged commit cfc9f4e into main Mar 28, 2026
16 checks passed
@NikolayS NikolayS deleted the fix/743-multihost branch March 28, 2026 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(connection): multi-host failover (comma-separated hosts, psql parity)

2 participants