[https://nvbugs/6212252][fix] Select CUTLASS MoE backend on non-Blackwell SMs in TestQwen3_5_35B_A3B::test_fp8 by xxi-nv · Pull Request #15081 · NVIDIA/TensorRT-LLM

xxi-nv · 2026-06-08T05:49:30Z

Summary by CodeRabbit

Tests
- Updated Qwen 3.5 35B accuracy test with adaptive MoE backend configuration that automatically selects the optimal backend based on hardware architecture instead of using a fixed configuration.
- Re-enabled previously skipped test case to expand validation coverage.

Description

TestQwen3_5_35B_A3B::test_fp8 hard-selected MoeConfig(backend='DEEPGEMM').
The DeepGEMM MoE kernels only support datacenter Blackwell (SM100/SM103). On
Hopper (SM90, e.g. H20) and consumer Blackwell (SM120/SM121) the unsupported
kernel runs anyway and trips a scale-factor dtype assertion during autotuner
warmup:

RuntimeError: Assertion error (deepgemm-src/.../utils/layout.hpp:68):
sfa_dtype == torch::kFloat and sfb_dtype == torch::kFloat

The DEEPGEMM dispatch branch in create_moe.py has no SM gate (unlike
DENSEGEMM/CUTEDSL/TRTLLM which fall back to CUTLASS), and
DeepGemmFusedMoE.can_implement (restricted to SM {100, 103}) is never invoked
from the dispatch path, so the test fails on non-Blackwell-datacenter GPUs.

Fix: select the MoE backend by SM version in the test — DEEPGEMM on
SM100/SM103, CUTLASS otherwise (CUTLASS supports FP8 block scales) — and
remove the corresponding waive entry.

The exact gate get_sm_version() in (100, 103) is used rather than the more
common >= 100 on purpose: this test also runs on rtx6k (RTX PRO 6000
Blackwell = SM120) in the QA lists. >= 100 would wrongly pick DEEPGEMM on
SM120/SM121 (also unsupported) and introduce a new failure, whereas
in (100, 103) matches DeepGemmFusedMoE's supported-SM set exactly.

Test Coverage

accuracy/test_llm_api_pytorch.py::TestQwen3_5_35B_A3B::test_fp8[enable_block_reuse=False]
accuracy/test_llm_api_pytorch.py::TestQwen3_5_35B_A3B::test_fp8[enable_block_reuse=True]

These run on B200 (SM100, l0_b200.yml) and rtx6k (SM120,
qa/llm_function_rtx6k.txt, qa/llm_function_core.txt), exercising both the
DEEPGEMM and CUTLASS branches. The previously failing case on Hopper
(SM90) now selects the CUTLASS backend.

PR Checklist

Please check this after reviewing the above items as appropriate for this PR.

…well SMs in TestQwen3_5_35B_A3B::test_fp8 DeepGEMM MoE kernels only support datacenter Blackwell (SM100/SM103). On Hopper (SM90) and consumer Blackwell (SM120/SM121) the unsupported kernel trips a scale-factor dtype assertion during autotuner warmup, so the test hard-selecting backend='DEEPGEMM' fails on those GPUs. Pick the backend by SM version (DEEPGEMM on SM100/SM103, CUTLASS otherwise, which supports FP8 block scales) and drop the corresponding waive entry. Signed-off-by: xxi <xxi@nvidia.com>

coderabbitai · 2026-06-08T05:51:45Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1cd59f76-97c9-48a7-952d-b63a2df23dad

📥 Commits

Reviewing files that changed from the base of the PR and between 2cad6db and ba0e59c.

📒 Files selected for processing (2)

tests/integration/defs/accuracy/test_llm_api_pytorch.py
tests/integration/test_lists/waives.txt

💤 Files with no reviewable changes (1)

tests/integration/test_lists/waives.txt

📝 Walkthrough

Walkthrough

This PR modifies the Qwen3.5 35B FP8 accuracy test to support multiple GPU architectures. The test's MoE backend configuration is now determined at runtime based on the detected SM version, replacing a hardcoded DEEPGEMM setting. A corresponding test waiver is removed, enabling the test to run on all supported SM versions.

Changes

Qwen3.5 35B MoE test architecture support

Layer / File(s)	Summary
Architecture-aware MoE backend selection and waiver removal `tests/integration/defs/accuracy/test_llm_api_pytorch.py`, `tests/integration/test_lists/waives.txt`	Test implementation now conditionally selects MoE backend: `DEEPGEMM` for SM 100/103, `CUTLASS` otherwise. Previously skipped test waiver is removed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#14854: Both PRs modify test waivers in tests/integration/test_lists/waives.txt to adjust which accuracy tests are skipped based on hardware configurations.

Suggested reviewers

yufeiwu-nv
leslie-fang25
jieli-matrix

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly specifies the exact change: selecting CUTLASS MoE backend on non-Blackwell SMs in a specific test, directly matching the PR's main objective.
Description check	✅ Passed	The description fully addresses the template requirements with clear problem explanation, solution details, comprehensive test coverage, and checklist verification.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

xxi-nv · 2026-06-08T05:51:58Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-08T05:57:24Z

PR_Github #52672 [ run ] triggered by Bot. Commit: ba0e59c Link to invocation

tensorrt-cicd · 2026-06-08T06:31:03Z

PR_Github #52672 [ run ] completed with state FAILURE. Commit: ba0e59c
/LLM/main/L0_MergeRequest_PR pipeline #41944 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

xxi-nv · 2026-06-08T07:23:03Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-08T07:29:11Z

PR_Github #52702 [ run ] triggered by Bot. Commit: 8e82f0a Link to invocation

tensorrt-cicd · 2026-06-08T12:15:40Z

PR_Github #52702 [ run ] completed with state SUCCESS. Commit: 8e82f0a
/LLM/main/L0_MergeRequest_PR pipeline #41969 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

xxi-nv requested a review from a team as a code owner June 8, 2026 05:49

github-actions Bot assigned xxi-nv Jun 8, 2026

xxi-nv requested a review from jieli-matrix June 8, 2026 07:18

Merge branch 'main' into fix/bug-6212252-qwen35-deepgemm-sm90

8e82f0a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6212252][fix] Select CUTLASS MoE backend on non-Blackwell SMs in TestQwen3_5_35B_A3B::test_fp8#15081

[https://nvbugs/6212252][fix] Select CUTLASS MoE backend on non-Blackwell SMs in TestQwen3_5_35B_A3B::test_fp8#15081
xxi-nv wants to merge 2 commits into
NVIDIA:mainfrom
xxi-nv:fix/bug-6212252-qwen35-deepgemm-sm90

xxi-nv commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 8, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

xxi-nv commented Jun 8, 2026

Uh oh!

tensorrt-cicd commented Jun 8, 2026

Uh oh!

tensorrt-cicd commented Jun 8, 2026

Uh oh!

xxi-nv commented Jun 8, 2026

Uh oh!

tensorrt-cicd commented Jun 8, 2026

Uh oh!

tensorrt-cicd commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xxi-nv commented Jun 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

Uh oh!

coderabbitai Bot commented Jun 8, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

xxi-nv commented Jun 8, 2026

Uh oh!

tensorrt-cicd commented Jun 8, 2026

Uh oh!

tensorrt-cicd commented Jun 8, 2026

Uh oh!

xxi-nv commented Jun 8, 2026

Uh oh!

tensorrt-cicd commented Jun 8, 2026

Uh oh!

tensorrt-cicd commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xxi-nv commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading