fix(evals): use non-daemon thread for background evaluators#5247
fix(evals): use non-daemon thread for background evaluators#5247alexmojaki merged 1 commit intomainfrom
Conversation
Daemon threads are killed abruptly when the main program exits, which can cause evaluation results to be lost if the program finishes before the background thread completes. Using a non-daemon thread ensures the evaluation work completes properly. https://claude.ai/code/session_01GnKTzs55m2DtdnaCVnXvRs
| _background_threads.discard(thread) | ||
|
|
||
| thread = threading.Thread(target=_thread_target, daemon=True) | ||
| thread = threading.Thread(target=_thread_target, daemon=False) |
There was a problem hiding this comment.
🔴 Non-daemon background thread can prevent process exit indefinitely
Changing daemon=True to daemon=False means that if a background evaluator hangs (e.g., a network call that never completes, a deadlock, or an unresponsive external service), the Python process will be unable to exit. The main thread and all other work may be done, but the interpreter will block waiting for the non-daemon thread to finish — with no timeout on the thread's actual work (ctx.run(anyio.run, _run) at line 174 has no time bound).
The existing wait_for_evaluations() function (pydantic_evals/pydantic_evals/_online.py:566) already provides an explicit opt-in mechanism (with a configurable timeout) for users who want to ensure evaluations complete before exiting. Making threads non-daemon is redundant for that use case and introduces a hang risk for users who don't call wait_for_evaluations() — their process will silently refuse to exit if any evaluation thread is stuck.
Prompt for agents
The change from daemon=True to daemon=False in dispatch_in_background_thread (line 179 of _online.py) prevents process exit if any background evaluator thread hangs. The function dispatch_in_background_thread is called from online.py:739 when there is no running event loop, to fire-and-forget evaluation work.
The stated goal is to avoid losing evaluation results when the main program exits quickly. However, non-daemon threads will block process exit indefinitely if any evaluator hangs (no timeout on the thread work itself at line 174: ctx.run(anyio.run, _run)).
Better approaches:
1. Keep daemon=True (the original) and document that users should call wait_for_evaluations() before exit to ensure results are flushed. This is already the documented pattern.
2. If non-daemon threads are truly desired, add an atexit handler that joins threads with a bounded timeout so the process can still exit.
3. Register an atexit hook that calls wait_for_evaluations with a reasonable timeout, keeping daemon=True so the process can always exit after the timeout.
Was this helpful? React with 👍 or 👎 to provide feedback.
| with _background_lock: | ||
| _background_threads.discard(thread) | ||
|
|
||
| thread = threading.Thread(target=_thread_target, daemon=True) | ||
| thread = threading.Thread(target=_thread_target, daemon=False) | ||
| with _background_lock: | ||
| _background_threads.add(thread) | ||
| try: |
There was a problem hiding this comment.
🚩 No atexit handler or thread-level timeout to complement the daemon=False change
With daemon=False, the process relies entirely on the user calling wait_for_evaluations() to drain threads before exit. But wait_for_evaluations (_online.py:566-589) is opt-in and only imposes a timeout on thread.join(), not on the thread's work itself (ctx.run(anyio.run, _run) at line 174). If a user never calls wait_for_evaluations(), non-daemon threads will silently keep the process alive. An atexit handler that joins with a bounded timeout — or a per-thread cancellation mechanism — would make the daemon=False change safe without requiring user action.
(Refers to lines 166-187)
Was this helpful? React with 👍 or 👎 to provide feedback.
Docs Preview
|
Daemon threads are killed abruptly when the main program exits, which can cause evaluation results to be lost if the program finishes before the background thread completes. Using a non-daemon thread ensures the evaluation work completes properly.
https://claude.ai/code/session_01GnKTzs55m2DtdnaCVnXvRs
Checklist