fix: restore actor weights after loading OPD teacher checkpoint#1903
Open
canlin03 wants to merge 1 commit into
Open
fix: restore actor weights after loading OPD teacher checkpoint#1903canlin03 wants to merge 1 commit into
canlin03 wants to merge 1 commit into
Conversation
When offload_train is disabled, the generic model-state recovery path
is skipped after load_other_checkpoint("teacher"). This left self.model
in teacher state, causing step-0 rollouts and evals to run with the
teacher model instead of the student (actor).
Fix: explicitly call _switch_model("actor") after the teacher checkpoint
is loaded when offload_train is not set.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When using Megatron-based OPD (
--opd-type megatron) without--offload-train,load_other_checkpoint("teacher")switchesself.modelto the teacher weights. The generic model-state recovery path only runs whenoffload_trainis enabled, so it is skipped here — leavingself.modelin teacher state at the end of__init__.As a result, step-0 rollouts and evaluations are silently run with the teacher model instead of the student (actor).
Reproduction
Full training script
Key flags — the bug triggers when
--offload-trainis NOT set:Observed (before fix): Step-0 eval accuracy on AIME-2024 is abnormally high — matching the teacher model's performance rather than the untrained student. In our run the teacher was a math RL-trained Qwen3-4B checkpoint (
Step500); the student was the base Qwen3-4B. sglang was silently serving teacher weights from the very first step.Expected (after fix): Step-0 eval accuracy drops to ~0.25, consistent with the base student model before any RL training.
If the maintainers need the teacher checkpoint or the dataset to reproduce this, feel free to reach out and I am happy to provide them.
Fix
Explicitly call
_switch_model("actor")after loading the teacher checkpoint whenoffload_trainis not set, ensuringself.modelis always restored to actor state beforeupdate_weights()pushes weights to sglang.