Skip to content

fix: preserve fused 3D expert tensors for Qwen3.5 MoE in torch_dist→H…#1904

Open
rouchenzi wants to merge 1 commit into
THUDM:mainfrom
rouchenzi:fix/fused-moe-expert-conversion
Open

fix: preserve fused 3D expert tensors for Qwen3.5 MoE in torch_dist→H…#1904
rouchenzi wants to merge 1 commit into
THUDM:mainfrom
rouchenzi:fix/fused-moe-expert-conversion

Conversation

@rouchenzi
Copy link
Copy Markdown

Summary

The current convert_torch_dist_to_hf.py has two issues when converting Qwen3.5 MoE checkpoints:

  1. Expert format mismatch: The original HF checkpoint stores non-MTP experts as fused 3D tensors ([num_experts, hidden, intermediate]), but the conversion splits them into per-expert 2D tensors, producing a checkpoint inconsistent with the original HF format: https://huggingface.co/Qwen/Qwen3.5-35B-A3B
  2. Duplicate expert weights: When using --add-missing-from-origin-hf, the script produces both the split per-expert 2D tensors AND the fused 3D tensors copied from origin, resulting in duplicate expert weights in two different formats.

This fix preserves the original 3D tensor layout for Qwen3.5 MoE without affecting other models. As a side benefit, loading the correct fused format is ~21x faster

Changes

  • Add _FUSED_EXPERT_MODELS registry (currently ["qwen3_5moe"])
  • Add _use_fused_experts(model_name, key_name) helper — returns True only for Qwen3.5 MoE non-MTP layers
  • Pass model_name through get_expert_paramget_layer_paramget_named_paramssave_tensors
  • For fused models: yield the 3D tensor as-is with corrected key name
  • MTP layers excluded (they use per-expert 2D format in HF)

Test

1. Fix validation

Round-trip conversion test on Qwen3.5-35B-A3B (HF → torch_dist → HF), comparing output keys against the original HF checkpoint (1811 keys):

Without fix:

  • 31333 output keys (30,720 extra per-expert 2D tensors)
  • With --add-missing-from-origin-hf: 32531 keys (both fused 3D from origin AND split 2D from conversion)

With fix:

  • 1811 output keys — all keys and shapes match the original HF checkpoint ✓

2. No impact on other models

The fix only triggers when model_name contains "qwen3_5moe". Checked all models in megatron_to_hf converter , none of their config names (qwen3moeconfig, deepseekv3config, chatglmconfig, qwen3nextconfig, llamaconfig, etc.) contain this substring.

3. Pre-commit checks pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant