Skip to content

[Fix] Fix dataset preview args, private streaming auth, and download retries#1700

Merged
wangxingjun778 merged 5 commits into
modelscope:masterfrom
wangxingjun778:fix/preview_fc
May 8, 2026
Merged

[Fix] Fix dataset preview args, private streaming auth, and download retries#1700
wangxingjun778 merged 5 commits into
modelscope:masterfrom
wangxingjun778:fix/preview_fc

Conversation

@wangxingjun778
Copy link
Copy Markdown
Member

@wangxingjun778 wangxingjun778 commented Apr 28, 2026

🛠️ Fixes

  1. Fix dataset preview argument passing: Filter out unrecognized kwargs (e.g., engine) before passing to BuilderConfig.__init__() to prevent TypeError.
  2. Fix auth failure for private datasets in streaming mode: Monkey-patch HfFileSystem to inject the m_session_id cookie, resolving 404 errors when loading private datasets with streaming=True.
  3. Add retry for ReadTimeout: Include ReadTimeout in the retry logic alongside connection errors to handle transient download failures.

✨ Improvements

  1. Enhance download logging: Add debug-level logs for request details (method, URL, timeout, headers) and response metrics (status, size, elapsed time).
  2. Optimize patch lifecycle: Ensure HfFileSystem patches remain active during streaming mode and are only restored in non-streaming contexts.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the dataset download utility with detailed logging, request timing, and a mechanism to filter configuration arguments based on the builder's dataclass fields. Key feedback includes a recommendation to include ReadTimeout in the retry logic to handle transient network issues, a suggestion to use time.perf_counter() for more precise duration measurements, and a warning that the current filtering of config_kwargs may strip necessary arguments intended for the DatasetBuilder constructor.

Comment thread modelscope/msdatasets/utils/hf_file_utils.py Outdated
Comment thread modelscope/msdatasets/utils/hf_file_utils.py Outdated
Comment thread modelscope/msdatasets/utils/hf_datasets_util.py
@wangxingjun778 wangxingjun778 changed the title [Fix] Fix args passing for dataset preview [Fix] Fix dataset preview args, private streaming auth, and download retries May 7, 2026
@wangxingjun778 wangxingjun778 merged commit c15c526 into modelscope:master May 8, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants