Fix Windows use-after-free of the pipe reader buffer retained by in-flight libuv reads#31940
Fix Windows use-after-free of the pipe reader buffer retained by in-flight libuv reads#31940robobun wants to merge 3 commits into
Conversation
…nto it On Windows, WindowsBufferedReader hands libuv a pointer into _buffer's spare capacity, and two read paths retain that pointer across event loop iterations: cooked-mode console reads (a worker thread blocked in ReadConsoleW converts the typed line into the alloc_cb buffer; uv_read_stop cancels asynchronously by injecting a VK_RETURN and never returns the buffer through read_cb) and uv_fs_read (the threadpool worker writes through File.iov until the completion callback fires; uv_cancel is best-effort). Teardown freed or moved the allocation anyway: finish() called shrink_to_fit (which frees a len==0 buffer outright), and deinit()/Drop dropped the Vec. The late worker write then landed in freed mimalloc memory and corrupted whatever was allocated there next, typically ArrayBuffer backing stores, crashing at the next full GC in Heap::sweepArrayBuffers / ~ArrayBufferContents / mi_free. Fix: treat HAS_INFLIGHT_READ as "libuv may still write into _buffer". Clear it only where the buffer provably came back (on_stream_read / on_file_read entry) instead of in finish()/_on_read_chunk, skip the shrink while it is set, park the buffer on the Box<File> (freed in on_close_complete) when a file op is still operating at detach, and leak it on teardown for console reads, which have no completion signal left to observe. take_buffer() copies out the committed bytes instead of handing over a retained allocation.
|
Found 3 issues this PR may fix:
🤖 Generated with Claude Code |
|
Checked all three before adding any auto-close lines:
|
WalkthroughThis PR fixes a Windows heap corruption issue where reader teardown races with in-flight libuv filesystem read operations. The fix retains read buffers in File, defers buffer shrinking until reads complete, reorders callback flag clearing, parks buffers during file operations, and adds a Windows regression test. ChangesWindows read buffer safety across teardown
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/io/PipeReader.rs`:
- Around line 1735-1750: on_file_read() currently only handles
DEFER_DONE_CALLBACK in the was_canceled branch, leaving DEFER_DONE_CALLBACK set
if the threadpool read completes normally (uv_cancel lost the race); update
on_file_read() so that in the successful-read (non-was_canceled) path it also
consumes the deferred-close: if DEFER_DONE_CALLBACK is set for
Source::File/Source::SyncFile, transfer self._buffer into
file.retained_read_buffer (mem::take(&mut self._buffer)) and invoke the same
done()/deferred-close handoff logic used in the was_canceled branch, and clear
DEFER_DONE_CALLBACK, so close()/deferred-completion is always completed
regardless of cancel race.
In `@test/js/bun/terminal/terminal-spawn.test.ts`:
- Around line 455-484: The test currently cancels the stdin read after a fixed
sleep which can be raced; instead gate reader.cancel() on an explicit observable
readiness signal that proves the native ReadConsoleW call is parked: add or use
a readiness API (e.g., expose/wait on a native binding like
Terminal.isReadConsoleWActive() or a promise such as waitForConsoleReadActive())
immediately after calling Bun.stdin.stream().getReader() and reader.read(), and
only call reader.cancel() once that readiness function resolves; then run the
existing probes loop (probes array and corruption checks) from that
guaranteed-ready state so the test deterministically exercises the
cancellation-path corruption rather than depending on Bun.sleep timing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 57f5a940-37f2-4ec5-8c60-52f08750242d
📒 Files selected for processing (4)
src/io/PipeReader.rssrc/io/source.rssrc/runtime/webcore/FileReader.rstest/js/bun/terminal/terminal-spawn.test.ts
close() defers teardown via DEFER_DONE_CALLBACK while a uv_fs_read is operating, but only the UV_ECANCELED branch of on_file_read consumed the flag. When the threadpool worker had already started the read, uv_cancel fails, the read completes normally, and the reader stayed paused forever with the File source attached and on_reader_done never fired. Consume the deferred close in the normal-completion path too. Also make the regression test's readiness condition observable: the child now proves the console line-read machinery works with a parent round-trip before arming the read under test.
On Windows the child exit IOCP and the final pseudoconsole pipe-data IOCP are independent, so the PROBE-* line can arrive after proc.exited resolves. Wait for the marker via the data callback before asserting, matching the file's other tests.
There was a problem hiding this comment.
Thanks for the fix in 5d9d259 — the ConPTY verdict-line race is resolved. I'm not auto-approving since this reworks the HAS_INFLIGHT_READ lifetime invariant and buffer-teardown paths (including intentional leaks and Drop changes) in unsafe Windows IO code; that's worth a human pass.
Extended reasoning...
Overview
This PR fixes a Windows-only use-after-free where WindowsBufferedReader freed/moved _buffer while a libuv console line read or threadpool uv_fs_read still held a pointer into its spare capacity. Touches four files: src/io/PipeReader.rs (flag-lifecycle rework, release_buffer_on_teardown, take_buffer copy-and-leak path, close_impl buffer parking, deferred-close consumption in on_file_read, Drop), src/io/source.rs (new File::retained_read_buffer field), src/runtime/webcore/FileReader.rs (route drain/consume_reader_buffer through take_buffer), and a Windows-only ConPTY regression test.
Security risks
No new attack surface — this is a memory-safety hardening fix, not a feature. The change removes a heap-corruption vector. The intentional mem::forget paths are bounded (one spare-capacity buffer per torn-down reader on the console-cancel path) and documented as the only sound option given libuv never returns the buffer after uv_read_stop.
Level of scrutiny
High. This is production-critical unsafe Rust in the core IO layer, fixing the dominant Windows crash family (8k+ Sentry events). The fix redefines what HAS_INFLIGHT_READ means (now: "libuv may still write into _buffer"), moves where it's cleared, adds a new state-machine interaction between DEFER_DONE_CALLBACK and the uv_cancel-lost-the-race completion path, and changes Drop behavior. The reasoning is cross-thread (libuv worker threads writing through retained pointers) and the correctness argument leans on libuv internals (uv__pipe_read_data synchrony, tty read_cb ordering, uv_cancel best-effort semantics). Getting any of these invariants wrong trades one UAF for another, or for a hang/leak — exactly the sort of change a human owner of this code should sign off on.
Other factors
- The bug-hunting system found no issues on the current revision.
- All three prior review threads (CodeRabbit ×2, my ConPTY race comment) are resolved; the deferred-close consumption fix (e53bc27) and the test's
probeReportedawait (5d9d259) both landed. - CI failures on the latest build appear to be infrastructure ("Failed to create agent" on Windows aarch64, musl LTO link issues) and a pre-existing
bunx.test.tsflake, not related to this change. - The regression test is Windows-only and acknowledged as probabilistic on the unfixed build (one-sided nondeterminism, documented in the test); fail-before cannot be demonstrated on the Linux runner here.
- POSIX paths are untouched (
PosixBufferedReaderreads synchronously), so blast radius is Windows-only — but that's also where the fix can't be exercised by Linux CI.
|
CI status: the only failing test in build 61166 is This reproduces locally with the stock release bun ( Everything this PR changed is green: both Windows targets compile ( |
What does this PR do?
Fixes the dominant Windows crash family in Sentry: SEGV during full GC inside
JSC::Heap::sweepArrayBuffers->JSC::ArrayBuffer::~ArrayBuffer->JSC::ArrayBufferContents::~ArrayBufferContents->mi_free->_mi_ptr_page/mi_validate_ptr_page(BUN-30T8, 8k+ events, plus siblings BUN-2WZN, BUN-2R90, BUN-2XNJ, BUN-33SN faulting while walking the GC incoming-ref set). Almost exclusively Windows, bun 1.3.11 through 1.3.14, workloads using fetch + spawn or plain CLI runs.Root cause
WindowsBufferedReaderhands libuv a pointer into_buffer's spare capacity (get_read_buffer_with_stable_memory_address). Two Windows read paths retain that pointer across event-loop iterations:process.stdin/Bun.stdin.stream()on a console): libuv'suv__tty_queue_read_linestores the alloc_cb buffer and blocks a worker thread inReadConsoleW. The worker converts the typed line into that buffer (uv_utf16_to_wtf8intoread_line_buffer.base) even when the read was cancelled mid-flight:uv_read_stopcancels asynchronously by injecting a VK_RETURN, and after read_stop the buffer is never handed back throughread_cb.uv_fs_read: the threadpool worker writes throughFile.iovuntil the completion callback fires;uv_cancelonly succeeds if the work item has not started.Teardown freed or moved the allocation anyway:
finish()called_buffer.shrink_to_fit(). For a streaming readerlen == 0, so this frees the whole block while the console worker still targets it.deinit()andDropdropped theVecunconditionally.HAS_INFLIGHT_READwas cleared infinish()/_on_read_chunk, i.e. while the read was in fact still outstanding, so nothing downstream could even detect the retention.The late worker write then lands in freed mimalloc memory and corrupts whatever was allocated there next. 8 KiB blocks are a prime size class for ArrayBuffer backing stores (fetch response bodies), so the corruption surfaces arbitrarily later as a GC-sweep crash:
~ArrayBufferContentsrunsmi_freeon a clobbered data pointer, and_mi_ptr_page's release-build page-map lookup dereferences an uncommitted/NULL submap entry (fault addresses like0x0000B1A0areNULL[sub_idx]). macOS uses the checked_mi_ptr_pagevariant, which is why the family skews so heavily to Windows. The 1.3.x Zig implementation (src/io/PipeReader.zig:finish->shrinkAndFree,deinit->buffer().deinit()) has the identical defect; the Rust port carried it over unchanged.Fix
Establish the invariant that
_buffer's allocation is never freed or moved while libuv may still write into it:HAS_INFLIGHT_READnow means exactly "libuv may still write into_buffer". It is cleared only where the buffer provably came back:on_stream_readentry (pipe reads complete synchronously insideuv__pipe_read_data; a ttyread_cbonly fires after the reader thread finished) andon_file_readentry. The clears infinish()and_on_read_chunkare removed (the trailing_on_read_chunkclear could also clobber the flag of a read re-armed by a re-entrant callback).finish()skipsshrink_to_fitwhile the flag is set.close_implparks the buffer on theBox<File>(newFile::retained_read_bufferfield) when a file op is still operating/canceling at detach; the Box is freed inon_close_complete, strictly after the operation finished, so the buffer is reclaimed exactly rather than leaked.on_file_readnow also consumesDEFER_DONE_CALLBACKwhen the read completes normally, i.e. whenuv_cancellost the race because the threadpool worker had already started. Previously only theUV_ECANCELEDbranch honored the deferred close, so the reader stayed paused forever with itsFilesource attached andon_reader_donenever fired (flagged by review).deinit()/Droprelease the buffer throughrelease_buffer_on_teardown(): freed normally when no read is pending, leaked when one is. Console reads afteruv_read_stophave no completion signal left to observe, so the leak is unavoidable, but it is bounded to one spare-capacity buffer per torn-down reader and replaces heap corruption.take_buffer()copies out the committed bytes (disjoint from the region the worker writes) instead of handing the caller a retained allocation;FileReader::drain/consume_reader_buffernow go through it instead ofmem::take(buffer()).POSIX readers are untouched (
PosixBufferedReaderreads synchronously on the loop thread; no retention exists).How did you verify your code works?
Test
test/js/bun/terminal/terminal-spawn.test.ts: "cancelling a parked console stdin read does not corrupt the heap" (Windows-only, usesBun.Terminal/ConPTY so the child gets a real console). The child arms a cooked-mode stdin read, waits for the libuv worker to park inReadConsoleW, cancels the stream (pre-fix:finish()frees the 8 KiB buffer while cancellation is in flight), immediately adopts the freed block with same-sizeArrayBufferprobes (mimalloc serves freshly freed blocks of a size class first), and fails if the cancelled read's late"\r\n"write mutates a probe.Caveats, stated plainly:
#[cfg(windows)]), so the test skips there and cannot demonstrate fail-before on a Linux runner. Fail-before on Windows follows from the mechanism above (the VK_RETURN-cancelled read always performs the late write; detection depends on the freed block being reused by a probe, which mimalloc's free-list LIFO makes the overwhelmingly likely outcome).Verification done here:
cargo check --workspaceforx86_64-pc-windows-msvcandaarch64-pc-windows-msvcpasses; on Linux,bun bd testpasses forterminal-spawn.test.ts(new test skips),spawn.test.ts(115 pass),spawn-streaming-stdin.test.ts,process-stdin.test.ts, andstreams.test.js(2 failures pre-exist on main, unrelated).Possibly related issues (not claimed as fixed)
#26026 and #26862 match this corruption family's profile (Windows, long-running fetch/spawn/stdin workloads, delayed crashes at garbage addresses) but include no stacks to confirm the mechanism, so they are not auto-closed here. #27692 (flagged by the issue finder) reports kernel BSODs, which a user-mode use-after-free cannot cause; it is not addressed by this PR.