feat(storage): log warning when GCS serves more bytes than requested#16141
feat(storage): log warning when GCS serves more bytes than requested#16141v-pratap wants to merge 17 commits into
Conversation
Implement a thread-safe, one-shot mechanism to detect when a range request is overshot, and print an actionable warning log at the end of the stream. This is implemented for: 1. ObjectReadStreambuf (Single-Stream Readers) 2. ReadRange (Concurrent / Multi-Range Download Managers) The warning log contains rich context: bucket name, object name, and the amount of extra bytes served. Bug: b/455959466
5cfbb21 to
9f95aff
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces overrun logging to detect and warn when more bytes are received from GCS than requested, updating both ReadRange and ObjectReadStreambuf to track received bytes and log warnings upon overrun. The review feedback highlights a critical issue in ObjectReadStreambuf where remain_ is only decremented in xsgetn() but not in underflow(), which will cause inaccurate tracking for character-by-character or line-by-line reads. Additionally, it is recommended to add test coverage for these underflow() scenarios.
| { | ||
| std::lock_guard<std::mutex> lk(mu_); | ||
| if (remain_.has_value()) { | ||
| *remain_ -= read->bytes_received; | ||
| } | ||
| } |
There was a problem hiding this comment.
In ObjectReadStreambuf, bytes can be read from source_ either via xsgetn() (for bulk reads) or via underflow() (when filling the internal buffer for single-character or line-based reads).
Currently, remain_ is only decremented inside xsgetn(). If a user reads from the stream using methods that trigger underflow() (such as std::getline, stream >> var, or stream.get()), those received bytes will not be subtracted from remain_. This will result in inaccurate tracking and cause the overrun warning to be missed or calculated incorrectly.
Please ensure that underflow() also decrements remain_ under the lock when it successfully receives bytes from source_.
…narios Add unit tests to verify overrun logging behavior under: 1. Character-by-character reads (using stream.get()) in ObjectReadStreambuf. 2. ReadLast(N) requests where GCS returns less than N (no logging expected). 3. ReadLast(N) requests where GCS returns more than N (logging expected). These tests cover both synchronous (ObjectReadStreambuf) and asynchronous (ReadRange) implementation paths.
- Skip overrun warning in ObjectReadStreambuf if decompressive transcoding is active (transformation().has_value() is true), matching the behavior of other SDKs (like Go) and avoiding false positives. - Initialize remain_ to 0 for empty range requests (range.end == begin) in ObjectReadStreambuf to ensure that any erroneous data sent by GCS is caught and logged as an overrun. - Add corresponding unit tests to verify these behaviors.
- Expose Remain() in the public ObjectReadStream class to allow users to programmatically query the remaining bytes in a range request, improving consistency with the Go SDK. - Add thread-safety comments (// Must be called while holding mu_.) to ReadRange::CheckOverrun() in read_range.h and read_range.cc to document lock requirements.
…verrun logging - Add MetadataAccessor callback to ReadRange to allow it to dynamically query the object's metadata from ObjectDescriptorImpl. - Update ReadRange::CheckOverrun() to check if the object's content_encoding is "gzip". If so, suppress the overrun warning, matching the behavior of the synchronous path and other SDKs (like Go) to avoid false positives during decompressive transcoding. - Pass the metadata accessor callback from ObjectDescriptorImpl::Read using a weak pointer to avoid circular references. - Add TranscodingSuppressesWarning unit test to read_range_test.cc to verify this behavior.
- Expand the lock scope in xsgetn() to protect all metadata updates (generation_, metageneration_, storage_class_, size_, transformation_, headers_, and source_pos_) under the mu_ lock, preventing data races when these are accessed concurrently. - Refactor all metadata getters in ObjectReadStreambuf and ObjectReadStream to lock mu_ and return by value instead of const& to guarantee thread safety during concurrent reads. - Fix a critical self-deadlock in ObjectReadStreambuf::Close() by accessing the transformation_ member variable directly instead of calling the transformation() getter (which attempts to re-acquire the already-held mu_ lock).
…ructor - Remove default arguments for bucket_name and object_name in the primary ReadRange constructor, making them mandatory. - Delete the fallback constructor (without bucket/object names) to prevent any loophole for empty logs. - Update all unit tests in read_range_test.cc to pass explicit bucket and object names.
…tream - Refactor status(), received_hash(), and computed_hash() in ObjectReadStreambuf and ObjectReadStream to lock mu_ and return by value instead of const&. - This completes the thread-safety synchronization of the synchronous path, preventing data races when these getters are accessed concurrently with active stream reads.
- Refactor requested_length_ in ReadRange to be an absl::optional<std::int64_t>. This allows us to cleanly distinguish between "no limit (read to end)" (absl::nullopt) and a strict limit (e.g. 0 for a zero-length request). - Update CheckOverrun() to check *requested_length_ >= 0, allowing overrun detection on zero-length requests, bringing parity with the Go SDK and the C++ synchronous path. - Update ObjectDescriptorImpl::Read to pass p.length > 0 ? absl::make_optional(p.length) : absl::nullopt to the ReadRange constructor. - Add a new ZeroLengthOverrunLogging test case to read_range_test.cc to verify that an explicit 0-byte limit overrun is correctly caught and logged.
…ngth overrun checks - Wrap source_pos_ access in ObjectReadStreambuf::seekoff() with std::lock_guard<std::mutex> to prevent data races when tellg() is called concurrently with active stream reads. - Add read_to_end boolean flag (defaulting to false) to ObjectDescriptorConnection::ReadParams in a backwards-compatible way. - Update ObjectDescriptor::ReadFromOffset and ReadLast to explicitly set read_to_end = true, while Read sets it to false. - Update ObjectDescriptorImpl::Read to use p.read_to_end to cleanly distinguish "read to end" (passing absl::nullopt) from a strict limit (passing p.length, which can be 0 for a zero-length request). This allows ReadRange to detect overruns on 0-byte async requests without false positives on full downloads.
…rrun checks - Remove the MetadataAccessor callback from ReadRange constructor and replace it with a simple is_transcoded boolean parameter in OnRead(). This completely avoids the ABBA deadlock and self-deadlock risk during OnFinish where we would otherwise try to acquire ObjectDescriptorImpl::mu_ while holding ReadRange::mu_. - Update ObjectDescriptorImpl::OnRead to evaluate the transcoding status and pass it to ReadRange::OnRead(). - Update all unit tests in read_range_test.cc to pass the transcoding status directly to OnRead, dramatically simplifying the tests by removing the verbose metadata mocks. - Update ObjectReadStreambuf constructor to use >= 0 for ReadLast option handling, enabling overrun checks on explicit ReadLast(0) requests.
Latch the is_transcoded_ value to true if it is ever true during the stream's lifetime in ReadRange::OnRead(). This is marginally safer and prevents any edge cases where subsequent chunks might report different metadata.
… races in sync - Move logged_warning_ = true; inside the overrun check if block in ReadRange::CheckOverrun(). This ensures that we don't prematurely suppress overrun checks on subsequent chunks in a multi-chunk stream if the first chunk was within limits. - Add MultiChunkOverrunLogging test case to read_range_test.cc to verify that overrun is correctly caught on a second chunk. - Add std::lock_guard to ObjectReadStreambuf::ReportError(), CheckPreconditions(), and ValidateHashes() to prevent concurrent data races on status_, hash_validator_result_, and metadata fields. - Make ObjectReadStreambuf::IsOpen() thread-safe and nullptr-safe by locking mu_ and checking source_ != nullptr. - Update CheckPreconditions() and ValidateHashes() to access source_ and transformation_ directly under the lock, and update ThrowHashMismatchDelegate() to use member variables directly, preventing lock re-entrancy deadlocks on non-recursive std::mutex.
- Update ObjectDescriptorImpl::Read to calculate the correct limit for ReadLast requests (where p.start < 0). Since ReadLast(limit) requests the last limit bytes, the limit is -p.start. This enables overrun detection on ReadLast requests without changing the gRPC protocol behavior (which still receives read_length = 0). - Add ReadLastOverrunLogging test case to read_range_test.cc to verify that overrun is correctly caught and logged for ReadLast requests.
kalragauri
left a comment
There was a problem hiding this comment.
Do we need similar changes under AsyncClient::ReadObject?
| hash_validator_(CreateHashValidator(request)), | ||
| bucket_name_(request.bucket_name()), | ||
| object_name_(request.object_name()) { | ||
| if (request.HasOption<ReadRange>() && request.HasOption<ReadFromOffset>()) { |
There was a problem hiding this comment.
If a user performs a full object read (no range options) and GCS overshoots, it will not be logged because remain_ will be nullopt.
|
|
||
| offset_ += content.size(); | ||
| received_bytes_ += content.size(); | ||
| if (length_ != 0) length_ -= std::min<std::int64_t>(content.size(), length_); |
There was a problem hiding this comment.
This PR doesn't modify this line but if GCS overshoots, length_ will be set to 0 here. If the connection breaks, ReadRange::RangeForResume will be called to resume the connection, and read_length will be set to 0 in the proto: range.set_read_length(length_);
IIUC, read_length = 0 is interpreted as full object read, which will cause the resumed stream to download the entire remainder of the object.
…ync path - Update ReadRange::OnRead to ignore checksum mismatches if the object is transcoded (is_transcoded_ is true). When GCS dynamically decompresses (transcodes) a gzip-compressed object, the received bytes are uncompressed, but GCS metadata hashes belong to the original compressed file, which would otherwise always trigger a false-positive DataLossError. - Add TranscodingIgnoresChecksumMismatch test case to read_range_test.cc to verify that a checksum mismatch is ignored and the read succeeds when is_transcoded is true.
…overrun checks - Update ReadRange::RangeForResume to return absl::nullopt (refuse to resume) if the requested range has already been fully satisfied or exceeded. This prevents a catastrophic bug where a completed range read would be resumed with read_length = 0 (unlimited), downloading the rest of the file and causing massive unexpected egress costs. - Add NoResumeIfRequestSatisfied and NoResumeIfRequestExceeded tests to read_range_test.cc to verify this behavior. - Move CheckOverrun() in ReadRange::OnRead to be called immediately on every chunk rather than waiting for range_end(), enabling immediate warnings if a rogue backend sends infinite data. - Update ObjectReadStreambuf to initialize remain_ to *size_ as soon as the object's total size is learned in xsgetn, enabling overrun detection for full-object downloads. - Move the remain_ decrement to the end of the lock block in ObjectReadStreambuf::xsgetn to ensure the first chunk's bytes are correctly subtracted from the newly-learned size. - Add FullObjectReadOverrunLogging test to object_read_streambuf_test.cc to verify full object overrun logging. - Remove the redundant std::lock_guard from ObjectReadStreambuf::seekoff() since streams are inherently not thread-safe for concurrent read/seek operations.
Add an explanatory comment in ObjectReadStreambuf::xsgetn to document the dynamic late-initialization strategy for remain_ during full-object reads. Since the expected size is not known upfront for full reads, we initialize remain_ to the object's size the first time we learn it from the GCS response metadata.
Implement a thread-safe, one-shot mechanism to detect when a range request is overshot, and print an actionable warning log at the end of the stream.
This is implemented for all kinds of reads Sync, Async, MRD.
The warning log contains: bucket name, object name, and the amount of extra bytes served.