Skip to content

fix(connection): scan for any .s.PGSQL.<port> socket, not just 5432#728

Merged
NikolayS merged 2 commits intomainfrom
fix/conn-b1-socket-any-port
Mar 24, 2026
Merged

fix(connection): scan for any .s.PGSQL.<port> socket, not just 5432#728
NikolayS merged 2 commits intomainfrom
fix/conn-b1-socket-any-port

Conversation

@NikolayS
Copy link
Copy Markdown
Owner

Problem

B1 audit test fails: when rpg is invoked with no host/port and a Postgres socket exists in /tmp at a non-standard port (e.g. /tmp/.s.PGSQL.5437), default_host() (now default_host_port()) only checked for .s.PGSQL.5432 — hardcoded — and fell back to localhost:5432 (connection refused).

Fix

Replace default_host() -> String with default_host_port() -> (String, u16):

  1. Fast path: check for .s.PGSQL.5432 first (libpq-compatible, O(1))
  2. Slow path: if not found, scan the directory for any .s.PGSQL.<N> socket file, excluding .lock files, and return the first valid one with its port number

Both resolve_host() and resolve_port() updated to use default_host_port() as their fallback.

Tests

  • 1702 unit tests pass
  • All 16 B1/connection audit tests pass locally (including B1: rpg -c 'SHOW port' → connects via socket at port 5437)
  • test_defaults updated to assert against default_host_port() result rather than hardcoded 5432 (environment-aware)

Fixes B1 audit test.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 24, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 62.50000% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.74%. Comparing base (5d68046) to head (c4cdcf1).

Files with missing lines Patch % Lines
src/connection.rs 62.50% 15 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #728      +/-   ##
==========================================
- Coverage   68.75%   68.74%   -0.01%     
==========================================
  Files          46       46              
  Lines       31061    31093      +32     
==========================================
+ Hits        21354    21372      +18     
- Misses       9707     9721      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@NikolayS
Copy link
Copy Markdown
Owner Author

REV

Verdict: REQUEST_CHANGES

The direction is correct — auto-detecting non-5432 ports is worth having, and the fast path / slow path structure is sound. Three issues need fixing before merge, one of which is a correctness bug.


Issue 1 — Wrong existence check for sockets (correctness bug)

// Verify it is actually a socket, not a stale file.
if entry.path().exists() {

Path::exists() calls stat(2) and returns true for any file type — regular file, symlink, directory, device node. It does not verify that the entry is a Unix-domain socket. On a system where a stale regular file named .s.PGSQL.5437 is left behind (e.g., an unclean PostgreSQL shutdown left its lockfile but not the socket, or a miscreant touch), this path returns the wrong result and rpg attempts a socket connection that will fail at connect time.

The right check is:

use std::os::unix::fs::FileTypeExt;

if entry.path().metadata()
    .map(|m| m.file_type().is_socket())
    .unwrap_or(false)
{
    return ((*dir).to_owned(), port);
}

This is exactly what libpq does: it calls stat(2) and checks S_ISSOCK. The comment in the code even says "Verify it is actually a socket" — the implementation doesn't deliver on that promise.

Note: FileTypeExt::is_socket() is unix-only, but since this block is already gated by #[cfg(unix)] that is fine.

The fast path (.s.PGSQL.5432) has the same flaw but it is much less likely to hit a false positive there, since nothing else normally creates a file by that exact name.


Issue 2 — Double read_dir scan when neither host nor port is overridden

// resolve_host:
.unwrap_or_else(|| default_host_port().0);

// resolve_port:
.unwrap_or_else(|| default_host_port().1);

When neither PGHOST nor PGPORT is set (the common case in development), resolve_params calls both resolve_host and resolve_port, and each calls default_host_port() independently. That means two full directory scans (fast-path stat + potentially read_dir) of the same directories, returning a host/port pair that is guaranteed to be consistent only by accident if the socket set changes between calls.

The fix is straightforward: call default_host_port() once in resolve_params (or in a caller that drives both), pass the result into resolve_host and resolve_port as an Option hint, or merge the two functions into a single resolve_host_port(). The double scan is not catastrophic — it is at most two extra stat or read_dir calls per connection attempt — but the potential for an inconsistent (host, port) pair is a real correctness hazard, not just a performance concern.


Issue 3 — Non-deterministic port selection in the slow path

read_dir returns entries in filesystem (inode) order, which is undefined and varies between filesystems, kernel versions, and directory history. On a machine running PostgreSQL 14 (port 5433) and PostgreSQL 17 (port 5437), the slow path returns whichever read_dir surfaces first. The user has no control over this and no way to predict it.

The minimal fix is to collect matching ports, sort ascending, and pick the lowest. Lowest port is still heuristic, but it is at least deterministic and matches what a user would expect if they ran ls. Example:

let mut found_ports: Vec<u16> = Vec::new();
if let Ok(entries) = std::fs::read_dir(&path) {
    for entry in entries.flatten() {
        // ... parse port, check is_socket ...
        found_ports.push(port);
    }
}
if let Some(&port) = found_ports.iter().min() {
    return ((*dir).to_owned(), port);
}

Alternatively, document the non-determinism explicitly and defer a proper solution to a future PR. What is not acceptable is leaving it silent — a user who hits the wrong PostgreSQL cluster because of inode order will have a very hard time debugging it.


Minor notes

Cargo.lock version bump (0.8.0 → 0.8.1): This is noise from a separate version bump PR and does not belong here. It is harmless but adds confusion when reading the diff. Recommend reverting before merge or acknowledging it is intentional.

Test update: The change from hardcoded 5432 to default_host_port() in the test is correct. The test now reflects actual runtime behavior rather than an assumption.

Fast path comment accuracy: The doc comment says "Port 5432 is tried first for libpq compatibility" — accurate and good to have.


Summary

# Severity Description
1 Bug exists() does not verify socket type; use is_socket()
2 Correctness risk Double default_host_port() call can return inconsistent host/port pair
3 Correctness risk Slow path is non-deterministic; sort or document
4 Noise Cargo.lock version bump unrelated to this change

Fix issues 1-3, then this is good to merge.

@NikolayS
Copy link
Copy Markdown
Owner Author

Follow-up REV on current HEAD (5878245).

Three issues from the original review remain open.

1. is_socket() not used — .exists() does not distinguish socket from regular file

Fast path (line ~378):

if path.join(".s.PGSQL.5432").exists() {

Slow path (line ~392):

if entry.path().exists() {

Both still use .exists() rather than metadata().file_type().is_socket(). The original issue stands: a regular file named .s.PGSQL.5432 (or a stale one left after a crash before the lock file is cleaned up) satisfies .exists() and triggers a false positive. The comment on the slow path says "Verify it is actually a socket, not a stale file" but the check doesn't actually verify that.

Fix: add use std::os::unix::fs::FileTypeExt; inside the #[cfg(unix)] block and replace both .exists() calls with .metadata().map(|m| m.file_type().is_socket()).unwrap_or(false).

2. default_host_port() called twice — TOCTOU window

resolve_host (line ~599) calls default_host_port().0 and resolve_port (line ~623) calls default_host_port().1 independently. Two separate read_dir scans run at connection time. Between them, a socket file could appear or disappear, so host and port could come from different sockets. This is the double-call issue from the original review.

Fix: call default_host_port() once in the caller (resolve_params) and pass dflt_host: String and dflt_port: u16 as arguments to resolve_host and resolve_port, replacing the unwrap_or_else closures with unwrap_or(dflt_host) / unwrap_or(dflt_port).

3. Slow path is non-deterministic

The slow path returns on the first matching entry from read_dir:

if entry.path().exists() {
    return ((*dir).to_owned(), port);
}

read_dir order is filesystem-defined (ext4: hash tree order, not alphabetical). If two sockets exist at non-standard ports, the returned port is arbitrary and varies between runs.

Fix: collect all valid ports into a Vec<u16>, sort, and take .first(). This is O(n) over the directory, which is negligible.


The doc comment says "the first one found" which correctly documents the current (non-deterministic) behavior, but that is the behavior that should change, not just be documented.

No other issues — the (String, u16) return type is correct, the #[cfg(unix)] scoping is correct, the lock file exclusion (!port_str.contains('.')) is correct, the 5432 fast path for libpq compatibility is correct, and the Default for ConnParams single-call usage is correct. The Cargo.lock version bump is expected noise.

@NikolayS NikolayS force-pushed the fix/conn-b1-socket-any-port branch 2 times, most recently from c2eead5 to 5a2feab Compare March 24, 2026 23:09
…terministic port sort

- Replace exists() with metadata().file_type().is_socket() in both the fast
  path (port 5432) and slow path to correctly reject non-socket files.
- Introduce default_host_port() -> (String, u16) replacing default_host() so
  host and port are derived from a single directory scan.
- Call default_host_port() once in resolve_params() and thread the results
  into resolve_host() and resolve_port() to eliminate the TOCTOU race from
  two independent read_dir() scans.
- Collect all matching socket ports in the slow path, sort ascending, and
  return the lowest for deterministic behavior when multiple sockets exist.
@NikolayS NikolayS force-pushed the fix/conn-b1-socket-any-port branch from 5a2feab to d72dc94 Compare March 24, 2026 23:23
@NikolayS NikolayS merged commit e70c90b into main Mar 24, 2026
16 checks passed
@NikolayS NikolayS deleted the fix/conn-b1-socket-any-port branch March 24, 2026 23:46
NikolayS added a commit that referenced this pull request Mar 25, 2026
Changes since v0.8.2:
- fix(connection): delegate URI parsing to tokio_postgres::Config (#731)
- fix(connection): scan for any .s.PGSQL.<port> socket (#728)
- test(connection): add connection path integration tests (#730)
NikolayS added a commit that referenced this pull request Mar 25, 2026
Changes since v0.8.2:
- fix(connection): delegate URI parsing to tokio_postgres::Config (#731)
- fix(connection): scan for any .s.PGSQL.<port> socket (#728)
- test(connection): add connection path integration tests (#730)
NikolayS added a commit that referenced this pull request Mar 25, 2026
Changes since v0.8.2:
- fix(connection): delegate URI parsing to tokio_postgres::Config (#731)
- fix(connection): scan for any .s.PGSQL.<port> socket (#728)
- test(connection): add connection path integration tests (#730)
NikolayS added a commit that referenced this pull request Mar 25, 2026
Changes since v0.8.2:
- fix(connection): delegate URI parsing to tokio_postgres::Config (#731)
- fix(connection): scan for any .s.PGSQL.<port> socket (#728)
- test(connection): add connection path integration tests (#730)
NikolayS added a commit that referenced this pull request Mar 25, 2026
Changes since v0.8.2:
- fix(connection): delegate URI parsing to tokio_postgres::Config (#731)
- fix(connection): scan for any .s.PGSQL.<port> socket (#728)
- test(connection): add connection path integration tests (#730)
NikolayS added a commit that referenced this pull request Mar 25, 2026
Changes since v0.8.2:
- fix(connection): delegate URI parsing to tokio_postgres::Config (#731)
- fix(connection): scan for any .s.PGSQL.<port> socket (#728)
- test(connection): add connection path integration tests (#730)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants