Skip to content

Commit 7e5cdfa

Browse files
nesquenaclaude
andcommitted
fix(providers): atomic .env write via tempfile + os.replace (nesquena#1164 cross-process)
The contributor PR's within-process fix (_ENV_LOCK on the load→modify→write cycle) is correct, but doesn't close the cross-process race that is the actual root cause of nesquena#1164. The hermes-agent CLI/Telegram bot uses hermes_cli.config.save_env_value, which writes .env atomically via tempfile + os.replace. The WebUI's _write_env_file was using os.open(..., O_TRUNC) — between the truncate and the complete write, a concurrent reader sees a partial or empty file. The agent has a _sanitize_env_lines workaround for the resulting corruption pattern (concatenated KEY=VALUE pairs from interleaved writes / partial reads), which is a strong signal that this race actually occurs in practice. Fix: stage the write through tempfile.mkstemp() in the same directory (same filesystem, so os.replace is atomic on POSIX), fsync before rename, chmod 0600 before rename so readers never see a 0644 window, and clean up the tempfile on any exception. Matches the exact pattern hermes_cli/config.py:save_env_value uses. Within-process safety from the PR's _ENV_LOCK is preserved (the whole sequence still runs inside the with-lock block). Also added a regression test asserting the new invariants: - tempfile staging present - os.replace called - O_TRUNC pattern is gone Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
1 parent d4de119 commit 7e5cdfa

2 files changed

Lines changed: 38 additions & 5 deletions

File tree

api/providers.py

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -153,12 +153,29 @@ def _write_env_file(env_path: Path, updates: dict[str, str | None]) -> None:
153153
content = "\n".join(output_lines)
154154
if content:
155155
content += "\n"
156-
# Create at owner-only mode from the first byte (O_CREAT honours the mode
157-
# argument subject to umask). A trailing chmod guards pre-existing files.
156+
# Atomic write via tempfile + os.replace so cross-process readers
157+
# (Telegram bot, CLI) never see a half-truncated file. The shared
158+
# ``~/.hermes/.env`` is also written by ``hermes_cli.config.save_env_value``
159+
# using the same atomic pattern; matching it here closes the
160+
# cross-process leg of #1164 (within-process is covered by _ENV_LOCK).
158161
_mode = _stat.S_IRUSR | _stat.S_IWUSR # 0o600
159-
_fd = os.open(str(env_path), os.O_WRONLY | os.O_CREAT | os.O_TRUNC, _mode)
160-
with os.fdopen(_fd, "w", encoding="utf-8") as _f:
161-
_f.write(content)
162+
import tempfile as _tempfile
163+
_tmp_fd, _tmp_path = _tempfile.mkstemp(
164+
dir=str(env_path.parent), prefix=".env_", suffix=".tmp"
165+
)
166+
try:
167+
with os.fdopen(_tmp_fd, "w", encoding="utf-8") as _f:
168+
_f.write(content)
169+
_f.flush()
170+
os.fsync(_f.fileno())
171+
os.chmod(_tmp_path, _mode) # tighten before rename so readers see 0600
172+
os.replace(_tmp_path, env_path)
173+
except BaseException:
174+
try:
175+
os.unlink(_tmp_path)
176+
except OSError:
177+
pass
178+
raise
162179
try:
163180
env_path.chmod(_mode)
164181
except OSError:

tests/test_issue1164_env_file_corruption.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,3 +152,19 @@ def test_providers_write_env_holds_env_lock(self):
152152
"_write_env_file must use _ENV_LOCK for concurrency safety")
153153
self.assertIn("from api.streaming import _ENV_LOCK", source,
154154
"_ENV_LOCK must be imported from api.streaming")
155+
156+
def test_providers_write_env_uses_atomic_rename(self):
157+
"""providers._write_env_file must write atomically via tempfile +
158+
os.replace so cross-process readers (Telegram, CLI) never observe
159+
a truncated half-written file (#1164 cross-process leg)."""
160+
import inspect
161+
from api.providers import _write_env_file
162+
source = inspect.getsource(_write_env_file)
163+
self.assertIn("tempfile", source,
164+
"_write_env_file must stage writes through a tempfile")
165+
self.assertIn("os.replace(", source,
166+
"_write_env_file must atomically rename via os.replace")
167+
# The original O_TRUNC pattern must NOT remain — it is the source of
168+
# the cross-process race the PR is closing.
169+
self.assertNotIn("O_TRUNC", source,
170+
"_write_env_file must not truncate-in-place (#1164)")

0 commit comments

Comments
 (0)