test(sandbox): fix flaky arm64 procfs binary_path tests#881
Merged
Conversation
Three `procfs::tests` tests flake on arm64 CI runners. Two root causes, both test-side and both fixable with smaller changes than a retry loop: 1. Fork/exec race (`binary_path_strips_deleted_suffix`, `binary_path_preserves_live_deleted_basename`): `Command::spawn` returns once the child is scheduled, not once it has completed `exec()`. On a contended runner the immediately-following `/proc/<pid>/exe` readlink still returns the parent (test-harness) binary, and the target-path assertion fails with the harness path on the left. 2. ETXTBSY on spawn (`binary_path_strips_suffix_for_non_utf8_filename`): the test wrote the target binary via `OpenOptions::write(true) + sync_all + scope drop` before spawning it. Under load the kernel's release of the inode write lock raced `execveat`. Changes: - Add a small `wait_for_child_exec(pid, target)` helper that polls `/proc/<pid>/exe` for up to 2s until the readlink's byte prefix matches `target`. Byte-level `starts_with` tolerates the kernel's `" (deleted)"` suffix on unlinked binaries. Applied after every `spawn()` that is followed by a `/proc/<pid>/exe` readlink. - Replace the `OpenOptions + sync_all` dance in `binary_path_strips_suffix_for_non_utf8_filename` with `std::fs::copy`, matching the other two tests. `std::fs::copy` handles non-UTF-8 filenames correctly on Unix and doesn't leave a writer fd held across the exec boundary, which was the ETXTBSY trigger. Production `binary_path` behaviour is unchanged.
40cbb57 to
db80442
Compare
…ests The ETXTBSY race isn't specific to the `OpenOptions + sync_all` pattern — it's the kernel's `inode->i_writecount` release not being synchronous with `close(2)`, so any "write then exec immediately" sequence can race `execve`. CI has now surfaced the flake in `binary_path_strips_deleted_suffix` and `binary_path_preserves_live_deleted_basename` as well, both of which write via `std::fs::copy`. Add `spawn_retrying_on_etxtbsy` (20 attempts × 50 ms backoff, matches on `ErrorKind::ExecutableFileBusy`, any other error panics immediately) and apply it at every `spawn()` site. `wait_for_child_exec` still covers the separate post-spawn fork/exec race. Production `binary_path` is unchanged.
drew
approved these changes
Apr 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Failing tests:
Three
procfs::teststests flake on arm64 CI runners. Two independent test-side races, each needing its own mitigation:binary_path_strips_deleted_suffixandbinary_path_preserves_live_deleted_basenameread/proc/<child_pid>/exeimmediately afterCommand::spawn()returns.spawn()returns once the child is scheduled, not once it has completedexec(). Beforeexecvelands, the child inherits the parent'smm->exe_file, so/proc/<child>/exereadlinks to the test harness binary. The target-path assertion then fails with the harness path on the left (left: \"/__w/.../openshell_sandbox-f9dbf131daaba99d\",right: \"/tmp/.../sleepy (deleted)\")./bin/sleepto a temp path and then spawn it. The kernel rejectsexecvewheninode->i_writecount > 0, and the release of that counter after the writer fd is closed isn't synchronous withclose(2)under contention, so the very-next-instructionexecvecan still race it. This is independent of how the binary is written — CI has surfaced ETXTBSY in every test, not just the one that originally usedOpenOptions.Changes
crates/openshell-sandbox/src/procfs.rs:wait_for_child_exec(pid, target)helper (test-only, `#[cfg(target_os = "linux")]`). Polls `/proc//exe` on a 10 ms interval with a 2 s deadline until the readlink's byte prefix matches `target`. Byte-level `starts_with` tolerates the kernel's `" (deleted)"` suffix on unlinked binaries and non-UTF-8 filenames.Production `binary_path` behaviour is unchanged.
Related Issue
Blocking CI on #867.
Testing
Checklist