Skip to content

Fix Windows use-after-free of the pipe reader buffer retained by in-flight libuv reads#31940

Open
robobun wants to merge 3 commits into
mainfrom
farm/98c67e82/fix-windows-reader-buffer-uaf
Open

Fix Windows use-after-free of the pipe reader buffer retained by in-flight libuv reads#31940
robobun wants to merge 3 commits into
mainfrom
farm/98c67e82/fix-windows-reader-buffer-uaf

Conversation

@robobun
Copy link
Copy Markdown
Collaborator

@robobun robobun commented Jun 6, 2026

What does this PR do?

Fixes the dominant Windows crash family in Sentry: SEGV during full GC inside JSC::Heap::sweepArrayBuffers -> JSC::ArrayBuffer::~ArrayBuffer -> JSC::ArrayBufferContents::~ArrayBufferContents -> mi_free -> _mi_ptr_page / mi_validate_ptr_page (BUN-30T8, 8k+ events, plus siblings BUN-2WZN, BUN-2R90, BUN-2XNJ, BUN-33SN faulting while walking the GC incoming-ref set). Almost exclusively Windows, bun 1.3.11 through 1.3.14, workloads using fetch + spawn or plain CLI runs.

Root cause

WindowsBufferedReader hands libuv a pointer into _buffer's spare capacity (get_read_buffer_with_stable_memory_address). Two Windows read paths retain that pointer across event-loop iterations:

  • Cooked-mode console reads (default for process.stdin / Bun.stdin.stream() on a console): libuv's uv__tty_queue_read_line stores the alloc_cb buffer and blocks a worker thread in ReadConsoleW. The worker converts the typed line into that buffer (uv_utf16_to_wtf8 into read_line_buffer.base) even when the read was cancelled mid-flight: uv_read_stop cancels asynchronously by injecting a VK_RETURN, and after read_stop the buffer is never handed back through read_cb.
  • uv_fs_read: the threadpool worker writes through File.iov until the completion callback fires; uv_cancel only succeeds if the work item has not started.

Teardown freed or moved the allocation anyway:

  • finish() called _buffer.shrink_to_fit(). For a streaming reader len == 0, so this frees the whole block while the console worker still targets it.
  • deinit() and Drop dropped the Vec unconditionally.
  • HAS_INFLIGHT_READ was cleared in finish()/_on_read_chunk, i.e. while the read was in fact still outstanding, so nothing downstream could even detect the retention.

The late worker write then lands in freed mimalloc memory and corrupts whatever was allocated there next. 8 KiB blocks are a prime size class for ArrayBuffer backing stores (fetch response bodies), so the corruption surfaces arbitrarily later as a GC-sweep crash: ~ArrayBufferContents runs mi_free on a clobbered data pointer, and _mi_ptr_page's release-build page-map lookup dereferences an uncommitted/NULL submap entry (fault addresses like 0x0000B1A0 are NULL[sub_idx]). macOS uses the checked _mi_ptr_page variant, which is why the family skews so heavily to Windows. The 1.3.x Zig implementation (src/io/PipeReader.zig: finish -> shrinkAndFree, deinit -> buffer().deinit()) has the identical defect; the Rust port carried it over unchanged.

Fix

Establish the invariant that _buffer's allocation is never freed or moved while libuv may still write into it:

  • HAS_INFLIGHT_READ now means exactly "libuv may still write into _buffer". It is cleared only where the buffer provably came back: on_stream_read entry (pipe reads complete synchronously inside uv__pipe_read_data; a tty read_cb only fires after the reader thread finished) and on_file_read entry. The clears in finish() and _on_read_chunk are removed (the trailing _on_read_chunk clear could also clobber the flag of a read re-armed by a re-entrant callback).
  • finish() skips shrink_to_fit while the flag is set.
  • close_impl parks the buffer on the Box<File> (new File::retained_read_buffer field) when a file op is still operating/canceling at detach; the Box is freed in on_close_complete, strictly after the operation finished, so the buffer is reclaimed exactly rather than leaked.
  • on_file_read now also consumes DEFER_DONE_CALLBACK when the read completes normally, i.e. when uv_cancel lost the race because the threadpool worker had already started. Previously only the UV_ECANCELED branch honored the deferred close, so the reader stayed paused forever with its File source attached and on_reader_done never fired (flagged by review).
  • deinit()/Drop release the buffer through release_buffer_on_teardown(): freed normally when no read is pending, leaked when one is. Console reads after uv_read_stop have no completion signal left to observe, so the leak is unavoidable, but it is bounded to one spare-capacity buffer per torn-down reader and replaces heap corruption.
  • take_buffer() copies out the committed bytes (disjoint from the region the worker writes) instead of handing the caller a retained allocation; FileReader::drain/consume_reader_buffer now go through it instead of mem::take(buffer()).

POSIX readers are untouched (PosixBufferedReader reads synchronously on the loop thread; no retention exists).

How did you verify your code works?

Test

test/js/bun/terminal/terminal-spawn.test.ts: "cancelling a parked console stdin read does not corrupt the heap" (Windows-only, uses Bun.Terminal/ConPTY so the child gets a real console). The child arms a cooked-mode stdin read, waits for the libuv worker to park in ReadConsoleW, cancels the stream (pre-fix: finish() frees the 8 KiB buffer while cancellation is in flight), immediately adopts the freed block with same-size ArrayBuffer probes (mimalloc serves freshly freed blocks of a size class first), and fails if the cancelled read's late "\r\n" write mutates a probe.

Caveats, stated plainly:

  • The bug is unreachable on POSIX (the buggy code is #[cfg(windows)]), so the test skips there and cannot demonstrate fail-before on a Linux runner. Fail-before on Windows follows from the mechanism above (the VK_RETURN-cancelled read always performs the late write; detection depends on the freed block being reused by a probe, which mimalloc's free-list LIFO makes the overwhelmingly likely outcome).
  • There is no deterministic repro for the production crash itself (no ASAN lane exists on Windows); this test exercises the exact teardown path and detects the stale write directly.

Verification done here: cargo check --workspace for x86_64-pc-windows-msvc and aarch64-pc-windows-msvc passes; on Linux, bun bd test passes for terminal-spawn.test.ts (new test skips), spawn.test.ts (115 pass), spawn-streaming-stdin.test.ts, process-stdin.test.ts, and streams.test.js (2 failures pre-exist on main, unrelated).

Possibly related issues (not claimed as fixed)

#26026 and #26862 match this corruption family's profile (Windows, long-running fetch/spawn/stdin workloads, delayed crashes at garbage addresses) but include no stacks to confirm the mechanism, so they are not auto-closed here. #27692 (flagged by the issue finder) reports kernel BSODs, which a user-mode use-after-free cannot cause; it is not addressed by this PR.

…nto it

On Windows, WindowsBufferedReader hands libuv a pointer into _buffer's
spare capacity, and two read paths retain that pointer across event
loop iterations: cooked-mode console reads (a worker thread blocked in
ReadConsoleW converts the typed line into the alloc_cb buffer;
uv_read_stop cancels asynchronously by injecting a VK_RETURN and never
returns the buffer through read_cb) and uv_fs_read (the threadpool
worker writes through File.iov until the completion callback fires;
uv_cancel is best-effort).

Teardown freed or moved the allocation anyway: finish() called
shrink_to_fit (which frees a len==0 buffer outright), and deinit()/Drop
dropped the Vec. The late worker write then landed in freed mimalloc
memory and corrupted whatever was allocated there next, typically
ArrayBuffer backing stores, crashing at the next full GC in
Heap::sweepArrayBuffers / ~ArrayBufferContents / mi_free.

Fix: treat HAS_INFLIGHT_READ as "libuv may still write into _buffer".
Clear it only where the buffer provably came back (on_stream_read /
on_file_read entry) instead of in finish()/_on_read_chunk, skip the
shrink while it is set, park the buffer on the Box<File> (freed in
on_close_complete) when a file op is still operating at detach, and
leak it on teardown for console reads, which have no completion signal
left to observe. take_buffer() copies out the committed bytes instead
of handing over a retained allocation.
@github-actions github-actions Bot added the claude label Jun 6, 2026
@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented Jun 6, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

Found 3 issues this PR may fix:

  1. Segmentation Fault in Bun v1.3.5 on Windows 10 x64 During Opencode Execution #26026 - Windows segfault at 0xFFFFFFFFFFFFFFFF after ~755s with heavy spawn usage exercising WindowsBufferedReader pipe reads
  2. Segfault (null pointer at 0x0) in long-running standalone executable on Windows after sustained memory pressure #26862 - Windows segfault after sustained memory pressure (~78min, 892 spawns) crashing in mimalloc internals (mi_realloc_aligned)
  3. Bun crashes in Claude Code cause Windows BSOD (KERNEL_SECURITY_CHECK_FAILURE) — full system reboots on Windows 11 x64 #27692 - Windows BSOD (KERNEL_SECURITY_CHECK_FAILURE) during Claude Code sessions with heavy stdin/stdout/spawn usage, consistent with JSC::Heap corruption from freed pipe reader buffer

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #26026
Fixes #26862
Fixes #27692

🤖 Generated with Claude Code

@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented Jun 6, 2026

Checked all three before adding any auto-close lines:

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 6, 2026

Review Change Stack

Walkthrough

This PR fixes a Windows heap corruption issue where reader teardown races with in-flight libuv filesystem read operations. The fix retains read buffers in File, defers buffer shrinking until reads complete, reorders callback flag clearing, parks buffers during file operations, and adds a Windows regression test.

Changes

Windows read buffer safety across teardown

Layer / File(s) Summary
File retention buffer and safe extraction
src/io/source.rs, src/io/PipeReader.rs
Adds File.retained_read_buffer; WindowsBufferedReader::take_buffer() copies-and-leaks when HAS_INFLIGHT_READ is set; release_buffer_on_teardown() conditionally frees or leaks _buffer.
Flag and buffer shrinking lifecycle
src/io/PipeReader.rs
Clearing of HAS_INFLIGHT_READ moves to the libuv completion point (on_stream_read); _on_read_chunk preserves the flag across vtable callbacks; finish() only shrinks _buffer when no in-flight read exists.
Deferred close completion handling
src/io/PipeReader.rs
on_file_read clears DEFER_DONE_CALLBACK and invokes close_impl::<true>() on normal completion after a deferred close to avoid leaving the reader paused.
File state parking and reader cleanup
src/io/PipeReader.rs
close_impl parks _buffer into file.retained_read_buffer for Operating/Canceling file states; deinit() restores or intentionally forgets the source then calls release_buffer_on_teardown(); Drop calls release_buffer_on_teardown().
FileReader buffer extraction
src/runtime/webcore/FileReader.rs
FileReader::drain and consume_reader_buffer now call reader().take_buffer() to extract committed bytes while preserving retained allocations on Windows.
Windows console stdin cancellation test
test/js/bun/terminal/terminal-spawn.test.ts
Adds a Windows-only regression test that cancels a parked cooked-mode console read and verifies many 8 KiB ArrayBuffer-backed blocks are not corrupted.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main Windows use-after-free bug being fixed in the pipe reader buffer retention logic.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed PR description is comprehensive and follows the template with detailed explanations of what the PR does and how it was verified.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/io/PipeReader.rs`:
- Around line 1735-1750: on_file_read() currently only handles
DEFER_DONE_CALLBACK in the was_canceled branch, leaving DEFER_DONE_CALLBACK set
if the threadpool read completes normally (uv_cancel lost the race); update
on_file_read() so that in the successful-read (non-was_canceled) path it also
consumes the deferred-close: if DEFER_DONE_CALLBACK is set for
Source::File/Source::SyncFile, transfer self._buffer into
file.retained_read_buffer (mem::take(&mut self._buffer)) and invoke the same
done()/deferred-close handoff logic used in the was_canceled branch, and clear
DEFER_DONE_CALLBACK, so close()/deferred-completion is always completed
regardless of cancel race.

In `@test/js/bun/terminal/terminal-spawn.test.ts`:
- Around line 455-484: The test currently cancels the stdin read after a fixed
sleep which can be raced; instead gate reader.cancel() on an explicit observable
readiness signal that proves the native ReadConsoleW call is parked: add or use
a readiness API (e.g., expose/wait on a native binding like
Terminal.isReadConsoleWActive() or a promise such as waitForConsoleReadActive())
immediately after calling Bun.stdin.stream().getReader() and reader.read(), and
only call reader.cancel() once that readiness function resolves; then run the
existing probes loop (probes array and corruption checks) from that
guaranteed-ready state so the test deterministically exercises the
cancellation-path corruption rather than depending on Bun.sleep timing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 57f5a940-37f2-4ec5-8c60-52f08750242d

📥 Commits

Reviewing files that changed from the base of the PR and between a988615 and 945cfef.

📒 Files selected for processing (4)
  • src/io/PipeReader.rs
  • src/io/source.rs
  • src/runtime/webcore/FileReader.rs
  • test/js/bun/terminal/terminal-spawn.test.ts

Comment thread src/io/PipeReader.rs
Comment thread test/js/bun/terminal/terminal-spawn.test.ts
close() defers teardown via DEFER_DONE_CALLBACK while a uv_fs_read is
operating, but only the UV_ECANCELED branch of on_file_read consumed the
flag. When the threadpool worker had already started the read, uv_cancel
fails, the read completes normally, and the reader stayed paused forever
with the File source attached and on_reader_done never fired. Consume
the deferred close in the normal-completion path too.

Also make the regression test's readiness condition observable: the
child now proves the console line-read machinery works with a parent
round-trip before arming the read under test.
Comment thread test/js/bun/terminal/terminal-spawn.test.ts
On Windows the child exit IOCP and the final pseudoconsole pipe-data
IOCP are independent, so the PROBE-* line can arrive after proc.exited
resolves. Wait for the marker via the data callback before asserting,
matching the file's other tests.
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix in 5d9d259 — the ConPTY verdict-line race is resolved. I'm not auto-approving since this reworks the HAS_INFLIGHT_READ lifetime invariant and buffer-teardown paths (including intentional leaks and Drop changes) in unsafe Windows IO code; that's worth a human pass.

Extended reasoning...

Overview

This PR fixes a Windows-only use-after-free where WindowsBufferedReader freed/moved _buffer while a libuv console line read or threadpool uv_fs_read still held a pointer into its spare capacity. Touches four files: src/io/PipeReader.rs (flag-lifecycle rework, release_buffer_on_teardown, take_buffer copy-and-leak path, close_impl buffer parking, deferred-close consumption in on_file_read, Drop), src/io/source.rs (new File::retained_read_buffer field), src/runtime/webcore/FileReader.rs (route drain/consume_reader_buffer through take_buffer), and a Windows-only ConPTY regression test.

Security risks

No new attack surface — this is a memory-safety hardening fix, not a feature. The change removes a heap-corruption vector. The intentional mem::forget paths are bounded (one spare-capacity buffer per torn-down reader on the console-cancel path) and documented as the only sound option given libuv never returns the buffer after uv_read_stop.

Level of scrutiny

High. This is production-critical unsafe Rust in the core IO layer, fixing the dominant Windows crash family (8k+ Sentry events). The fix redefines what HAS_INFLIGHT_READ means (now: "libuv may still write into _buffer"), moves where it's cleared, adds a new state-machine interaction between DEFER_DONE_CALLBACK and the uv_cancel-lost-the-race completion path, and changes Drop behavior. The reasoning is cross-thread (libuv worker threads writing through retained pointers) and the correctness argument leans on libuv internals (uv__pipe_read_data synchrony, tty read_cb ordering, uv_cancel best-effort semantics). Getting any of these invariants wrong trades one UAF for another, or for a hang/leak — exactly the sort of change a human owner of this code should sign off on.

Other factors

  • The bug-hunting system found no issues on the current revision.
  • All three prior review threads (CodeRabbit ×2, my ConPTY race comment) are resolved; the deferred-close consumption fix (e53bc27) and the test's probeReported await (5d9d259) both landed.
  • CI failures on the latest build appear to be infrastructure ("Failed to create agent" on Windows aarch64, musl LTO link issues) and a pre-existing bunx.test.ts flake, not related to this change.
  • The regression test is Windows-only and acknowledged as probabilistic on the unfixed build (one-sided nondeterminism, documented in the test); fail-before cannot be demonstrated on the Linux runner here.
  • POSIX paths are untouched (PosixBufferedReader reads synchronously), so blast radius is Windows-only — but that's also where the fix can't be exercised by Linux CI.

@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented Jun 6, 2026

CI status: the only failing test in build 61166 is test/cli/install/bunx.test.ts, failing identically on all 15 test lanes (glibc, musl, macOS, Windows, ASAN) with:

error: UNABLE_TO_VERIFY_LEAF_SIGNATURE downloading tarball cowsay@github:piuccio/cowsay

This reproduces locally with the stock release bun (USE_SYSTEM_BUN=1), which contains none of this PR's code, so it is an external GitHub tarball TLS issue (verified still failing after 2+ hours of polling), not something this diff touches. This PR's changes are #[cfg(windows)] reader code plus a behavior-identical take_buffer routing in FileReader; the earlier build's other annotations (win11-aarch64 "failed to create agent", musl LTO stub link errors) were infra.

Everything this PR changed is green: both Windows targets compile (cargo check --workspace for x64 and aarch64 msvc), the linux-x64-asan lane passed everything except the external bunx test, and all review threads (CodeRabbit x2, claude-bot x1) are addressed and resolved. Needs a rebuild once the GitHub tarball issue clears, then a maintainer pass (claude-bot deferred approval to a human for the unsafe Windows IO rework, reasonably).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant