fix: preserve fused 3D expert tensors for Qwen3.5 MoE in torch_dist→H… by rouchenzi · Pull Request #1904 · THUDM/slime

rouchenzi · 2026-05-12T07:35:50Z

Summary

The current convert_torch_dist_to_hf.py has two issues when converting Qwen3.5 MoE checkpoints:

Expert format mismatch: The original HF checkpoint stores non-MTP experts as fused 3D tensors ([num_experts, hidden, intermediate]), but the conversion splits them into per-expert 2D tensors, producing a checkpoint inconsistent with the original HF format: https://huggingface.co/Qwen/Qwen3.5-35B-A3B
Duplicate expert weights: When using --add-missing-from-origin-hf, the script produces both the split per-expert 2D tensors AND the fused 3D tensors copied from origin, resulting in duplicate expert weights in two different formats.

This fix preserves the original 3D tensor layout for Qwen3.5 MoE without affecting other models. As a side benefit, loading the correct fused format is ~21x faster

Changes

Add _FUSED_EXPERT_MODELS registry (currently ["qwen3_5moe"])
Add _use_fused_experts(model_name, key_name) helper — returns True only for Qwen3.5 MoE non-MTP layers
Pass model_name through get_expert_param → get_layer_param → get_named_params → save_tensors
For fused models: yield the 3D tensor as-is with corrected key name
MTP layers excluded (they use per-expert 2D format in HF)

Test

1. Fix validation

Round-trip conversion test on Qwen3.5-35B-A3B (HF → torch_dist → HF), comparing output keys against the original HF checkpoint (1811 keys):

Without fix:

31333 output keys (30,720 extra per-expert 2D tensors)
With --add-missing-from-origin-hf: 32531 keys (both fused 3D from origin AND split 2D from conversion)

With fix:

1811 output keys — all keys and shapes match the original HF checkpoint ✓

2. No impact on other models

The fix only triggers when model_name contains "qwen3_5moe". Checked all models in megatron_to_hf converter , none of their config names (qwen3moeconfig, deepseekv3config, chatglmconfig, qwen3nextconfig, llamaconfig, etc.) contain this substring.

3. Pre-commit checks pass

…F conversion

fix: preserve fused 3D expert tensors for Qwen3.5 MoE in torch_dist→H…

a69c7be

…F conversion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: preserve fused 3D expert tensors for Qwen3.5 MoE in torch_dist→H…#1904

fix: preserve fused 3D expert tensors for Qwen3.5 MoE in torch_dist→H…#1904
rouchenzi wants to merge 1 commit into
THUDM:mainfrom
rouchenzi:fix/fused-moe-expert-conversion

rouchenzi commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rouchenzi commented May 12, 2026

Summary

Changes

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant