Skip to content

Tags: deepspeedai/DeepSpeed

Tags

v0.19.1

Toggle v0.19.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fix DeepCompile AOT kwargs patching for PyTorch >= v2.11 (#8024)

DeepCompiles breaks for PyTorch >= v2.11 because these versions can
construct the AOT Autograd backend without a bw_compiler kwarg, while
DeepSpeed's Inductor patch assumes that key is always present.

This PR fixes DeepCompile's AOT Autograd patch so unrelated AOT backend
registrations can pass through unchanged. `TestDeepCompile` passes with
this fix.

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

v0.19.0

Toggle v0.19.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update version.txt before 0.19.0 release (#7995)

v0.18.9

Toggle v0.18.9's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update version.txt for latest incoming release 0.18.9 (#7935)

v0.18.8

Toggle v0.18.8's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update version (#7903)

v0.18.7

Toggle v0.18.7's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fix hook count performance regression from v0.18.5 (#7886)

Fixes performance regressions reported in #7882 and #7885.

PR #7780 added dynamic hook count computation for reentrant
checkpointing correctness, but placed the call inside every gradient
hook closure. For a model with n parameter tensors, this creates
significant overhead per backward pass.

Summary:

1. Added `should_refresh_expected_hook_count()` predicate that returns
true only at backward phase boundaries (first hook, or new reentrant
phase), so `count_used_parameters_in_backward()` is called once per
phase instead of once per hook.
2. Applied this predicate in ZeRO-1/2 (stage_1_and_2.py) and both ZeRO-3
hook sites (stage3.py), reusing the `cached_max_expected_hooks_seen`
value when refresh isn't needed.
3. Changed enter_backward() to reset hook counters on first real
backward entry, preventing pollution from pre-user-backward autograd
calls (e.g., TiledFusedLogitsLoss).

With 24-layer transformer, ~267M params (147 parameter tensors), ZeRO-2,
8×H100 80GB, bf16, batch size 8, 20 warmup + 20 measured iterations:
  - Before fix: 0.1265s/iter
  - After fix: 0.0505s/iter

---------

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Co-authored-by: Ramya Ramineni <rraminen@users.noreply.github.com>

v0.18.6

Toggle v0.18.6's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Replace torch.jit.script with torch.compile (#7835) (#7840)

Fixes #7835.

On torch==2.10.0, importing DeepSpeed emitted deprecation warnings from
import-time JIT-decorated helpers.
This change updates the compatibility path to align with PyTorch
guidance while keeping import clean.

---------

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Co-authored-by: Hongwei Chen <33092912+hwchen2017@users.noreply.github.com>

v0.18.5

Toggle v0.18.5's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update PyTorch to v2.9 for modal tests (#7816)

Update PyTorch to v2.9 for modal tests

---------

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

v0.18.4

Toggle v0.18.4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix: avoid IndexError in BF16_Optimizer.destroy() when using DummyOpt…

…im (#7763)

fix: avoid IndexError in BF16_Optimizer.destroy() when using DummyOptim

Short-circuit BF16_Optimizer.destroy() if using_real_optimizer is False.
When initialized with optimizer=None (DummyOptim), bf16_groups remains
empty, causing an IndexError when accessing it in destroy().

Resolves #7752

v0.18.3

Toggle v0.18.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Wall clock timers API (#7714)

Make wall clock timers available to clients.

---------

Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>

v0.18.2

Toggle v0.18.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
README refresh (#7668)

Long overdue

---------

Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>