Skip to content

fix(gates): race web task unconditionally when web dashboard exists#95

Open
PolyphonyRequiem wants to merge 1 commit intomicrosoft:mainfrom
PolyphonyRequiem:fix/gate-race-start-web-task-unconditionally
Open

fix(gates): race web task unconditionally when web dashboard exists#95
PolyphonyRequiem wants to merge 1 commit intomicrosoft:mainfrom
PolyphonyRequiem:fix/gate-race-start-web-task-unconditionally

Conversation

@PolyphonyRequiem
Copy link
Copy Markdown
Member

Summary

_handle_gate_with_web decided between CLI and web gate resolution based on self._web_dashboard.has_connections() at the moment the gate was presented. When no WebSocket client was connected at that instant, it committed to the CLI-only path. Under --web-bg there is no attached stdin, and users typically open the per-run dashboard after seeing the gate-waiting notification, so:

  1. Gate is presented, has_connections() is False (no one is looking yet).
  2. Fall through to gate_handler.handle_gate → blocks on input() with no tty.
  3. User opens the per-run dashboard, clicks Approve.
  4. gate_response WebSocket message arrives and is put on _gate_response_queue.
  5. Nothing awaits the queue_wait_for_web_gate was never scheduled.
  6. Workflow hangs forever. gate_resolved event is never emitted.

Reproduction

  • conductor run <workflow> --web-bg
  • Wait for a human_gate agent to fire. Do not open the per-run dashboard yet.
  • After the terminal shows the gate-waiting notification, open the per-run dashboard and click any option.
  • WebSocket frame is sent and acknowledged; no gate_resolved is ever written, workflow stays blocked.

Fix

Only bail to the CLI-only path when there is no web dashboard at all. Whenever self._web_dashboard exists, start both the CLI and web tasks and race them. wait_for_gate_response is happy to await an empty queue until a client eventually connects and clicks, and if the CLI task resolves first the web task is cancelled cleanly (already handled by asyncio.wait(..., return_when=FIRST_COMPLETED) and the pending-cancel loop below).

Risk

Low. The prior has_connections() short-circuit was an optimization to avoid scheduling a web coroutine when clearly unused; removing it costs one extra asyncio.create_task per gate. No behavioral change for CLI-only runs (_web_dashboard is None).

Tests

Existing gate/race path had no test coverage, which is why this regressed silently. Happy to add a regression test covering 'client connects after gate presented' in a follow-up if desired.

When a human gate is presented and a web dashboard has been started
(e.g. via --web-bg), _handle_gate_with_web previously checked
`self._web_dashboard.has_connections()` and fell back to the CLI-only
path if no WebSocket client was currently connected.

In practice users almost always open the per-run dashboard *after*
seeing the gate-waiting notification, so `has_connections()` is
typically False at the moment the gate is presented. Under --web-bg
there is no attached stdin, so the CLI task blocks forever on
`input()`, and when the user later connects and clicks approve, the
`gate_response` WebSocket message is enqueued to
`_gate_response_queue` with no coroutine awaiting it. The workflow
hangs indefinitely.

Fix: only bail to CLI-only when there is no web dashboard at all.
Always start both the CLI task and the web-wait task in parallel when
a dashboard exists; `wait_for_gate_response` happily awaits an empty
queue until the user eventually connects and clicks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@82ec042). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #95   +/-   ##
=======================================
  Coverage        ?   85.49%           
=======================================
  Files           ?       46           
  Lines           ?     6604           
  Branches        ?        0           
=======================================
  Hits            ?     5646           
  Misses          ?      958           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants