Skip to content

[https://nvbugs/6212252][fix] Select CUTLASS MoE backend on non-Blackwell SMs in TestQwen3_5_35B_A3B::test_fp8#15081

Open
xxi-nv wants to merge 2 commits into
NVIDIA:mainfrom
xxi-nv:fix/bug-6212252-qwen35-deepgemm-sm90
Open

[https://nvbugs/6212252][fix] Select CUTLASS MoE backend on non-Blackwell SMs in TestQwen3_5_35B_A3B::test_fp8#15081
xxi-nv wants to merge 2 commits into
NVIDIA:mainfrom
xxi-nv:fix/bug-6212252-qwen35-deepgemm-sm90

Conversation

@xxi-nv
Copy link
Copy Markdown
Collaborator

@xxi-nv xxi-nv commented Jun 8, 2026

Summary by CodeRabbit

  • Tests
    • Updated Qwen 3.5 35B accuracy test with adaptive MoE backend configuration that automatically selects the optimal backend based on hardware architecture instead of using a fixed configuration.
    • Re-enabled previously skipped test case to expand validation coverage.

Description

TestQwen3_5_35B_A3B::test_fp8 hard-selected MoeConfig(backend='DEEPGEMM').
The DeepGEMM MoE kernels only support datacenter Blackwell (SM100/SM103). On
Hopper (SM90, e.g. H20) and consumer Blackwell (SM120/SM121) the unsupported
kernel runs anyway and trips a scale-factor dtype assertion during autotuner
warmup:

RuntimeError: Assertion error (deepgemm-src/.../utils/layout.hpp:68):
sfa_dtype == torch::kFloat and sfb_dtype == torch::kFloat

The DEEPGEMM dispatch branch in create_moe.py has no SM gate (unlike
DENSEGEMM/CUTEDSL/TRTLLM which fall back to CUTLASS), and
DeepGemmFusedMoE.can_implement (restricted to SM {100, 103}) is never invoked
from the dispatch path, so the test fails on non-Blackwell-datacenter GPUs.

Fix: select the MoE backend by SM version in the test — DEEPGEMM on
SM100/SM103, CUTLASS otherwise (CUTLASS supports FP8 block scales) — and
remove the corresponding waive entry.

The exact gate get_sm_version() in (100, 103) is used rather than the more
common >= 100 on purpose: this test also runs on rtx6k (RTX PRO 6000
Blackwell = SM120) in the QA lists. >= 100 would wrongly pick DEEPGEMM on
SM120/SM121 (also unsupported) and introduce a new failure, whereas
in (100, 103) matches DeepGemmFusedMoE's supported-SM set exactly.

Test Coverage

  • accuracy/test_llm_api_pytorch.py::TestQwen3_5_35B_A3B::test_fp8[enable_block_reuse=False]
  • accuracy/test_llm_api_pytorch.py::TestQwen3_5_35B_A3B::test_fp8[enable_block_reuse=True]

These run on B200 (SM100, l0_b200.yml) and rtx6k (SM120,
qa/llm_function_rtx6k.txt, qa/llm_function_core.txt), exercising both the
DEEPGEMM and CUTLASS branches. The previously failing case on Hopper
(SM90) now selects the CUTLASS backend.

PR Checklist

  • Please check this after reviewing the above items as appropriate for this PR.

…well SMs in TestQwen3_5_35B_A3B::test_fp8

DeepGEMM MoE kernels only support datacenter Blackwell (SM100/SM103). On
Hopper (SM90) and consumer Blackwell (SM120/SM121) the unsupported kernel
trips a scale-factor dtype assertion during autotuner warmup, so the test
hard-selecting backend='DEEPGEMM' fails on those GPUs.

Pick the backend by SM version (DEEPGEMM on SM100/SM103, CUTLASS otherwise,
which supports FP8 block scales) and drop the corresponding waive entry.

Signed-off-by: xxi <xxi@nvidia.com>
@xxi-nv xxi-nv requested a review from a team as a code owner June 8, 2026 05:49
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 8, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1cd59f76-97c9-48a7-952d-b63a2df23dad

📥 Commits

Reviewing files that changed from the base of the PR and between 2cad6db and ba0e59c.

📒 Files selected for processing (2)
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
  • tests/integration/test_lists/waives.txt
💤 Files with no reviewable changes (1)
  • tests/integration/test_lists/waives.txt

📝 Walkthrough

Walkthrough

This PR modifies the Qwen3.5 35B FP8 accuracy test to support multiple GPU architectures. The test's MoE backend configuration is now determined at runtime based on the detected SM version, replacing a hardcoded DEEPGEMM setting. A corresponding test waiver is removed, enabling the test to run on all supported SM versions.

Changes

Qwen3.5 35B MoE test architecture support

Layer / File(s) Summary
Architecture-aware MoE backend selection and waiver removal
tests/integration/defs/accuracy/test_llm_api_pytorch.py, tests/integration/test_lists/waives.txt
Test implementation now conditionally selects MoE backend: DEEPGEMM for SM 100/103, CUTLASS otherwise. Previously skipped test waiver is removed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • NVIDIA/TensorRT-LLM#14854: Both PRs modify test waivers in tests/integration/test_lists/waives.txt to adjust which accuracy tests are skipped based on hardware configurations.

Suggested reviewers

  • yufeiwu-nv
  • leslie-fang25
  • jieli-matrix
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly specifies the exact change: selecting CUTLASS MoE backend on non-Blackwell SMs in a specific test, directly matching the PR's main objective.
Description check ✅ Passed The description fully addresses the template requirements with clear problem explanation, solution details, comprehensive test coverage, and checklist verification.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@xxi-nv
Copy link
Copy Markdown
Collaborator Author

xxi-nv commented Jun 8, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #52672 [ run ] triggered by Bot. Commit: ba0e59c Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #52672 [ run ] completed with state FAILURE. Commit: ba0e59c
/LLM/main/L0_MergeRequest_PR pipeline #41944 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@xxi-nv xxi-nv requested a review from jieli-matrix June 8, 2026 07:18
@xxi-nv
Copy link
Copy Markdown
Collaborator Author

xxi-nv commented Jun 8, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #52702 [ run ] triggered by Bot. Commit: 8e82f0a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #52702 [ run ] completed with state SUCCESS. Commit: 8e82f0a
/LLM/main/L0_MergeRequest_PR pipeline #41969 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants