fix(gates): race web task unconditionally when web dashboard exists#95
Open
PolyphonyRequiem wants to merge 1 commit intomicrosoft:mainfrom
Open
Conversation
When a human gate is presented and a web dashboard has been started (e.g. via --web-bg), _handle_gate_with_web previously checked `self._web_dashboard.has_connections()` and fell back to the CLI-only path if no WebSocket client was currently connected. In practice users almost always open the per-run dashboard *after* seeing the gate-waiting notification, so `has_connections()` is typically False at the moment the gate is presented. Under --web-bg there is no attached stdin, so the CLI task blocks forever on `input()`, and when the user later connects and clicks approve, the `gate_response` WebSocket message is enqueued to `_gate_response_queue` with no coroutine awaiting it. The workflow hangs indefinitely. Fix: only bail to CLI-only when there is no web dashboard at all. Always start both the CLI task and the web-wait task in parallel when a dashboard exists; `wait_for_gate_response` happily awaits an empty queue until the user eventually connects and clicks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #95 +/- ##
=======================================
Coverage ? 85.49%
=======================================
Files ? 46
Lines ? 6604
Branches ? 0
=======================================
Hits ? 5646
Misses ? 958
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_handle_gate_with_webdecided between CLI and web gate resolution based onself._web_dashboard.has_connections()at the moment the gate was presented. When no WebSocket client was connected at that instant, it committed to the CLI-only path. Under--web-bgthere is no attached stdin, and users typically open the per-run dashboard after seeing the gate-waiting notification, so:has_connections()is False (no one is looking yet).gate_handler.handle_gate→ blocks oninput()with no tty.gate_responseWebSocket message arrives and is put on_gate_response_queue._wait_for_web_gatewas never scheduled.gate_resolvedevent is never emitted.Reproduction
conductor run <workflow> --web-bghuman_gateagent to fire. Do not open the per-run dashboard yet.gate_resolvedis ever written, workflow stays blocked.Fix
Only bail to the CLI-only path when there is no web dashboard at all. Whenever
self._web_dashboardexists, start both the CLI and web tasks and race them.wait_for_gate_responseis happy to await an empty queue until a client eventually connects and clicks, and if the CLI task resolves first the web task is cancelled cleanly (already handled byasyncio.wait(..., return_when=FIRST_COMPLETED)and the pending-cancel loop below).Risk
Low. The prior
has_connections()short-circuit was an optimization to avoid scheduling a web coroutine when clearly unused; removing it costs one extraasyncio.create_taskper gate. No behavioral change for CLI-only runs (_web_dashboard is None).Tests
Existing gate/race path had no test coverage, which is why this regressed silently. Happy to add a regression test covering 'client connects after gate presented' in a follow-up if desired.