-
Notifications
You must be signed in to change notification settings - Fork 16
fix: Diverse model seeding across PP ranks #426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Changes from all commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
5f9f50e
fix: Initialize different weights across TP ranks
rrutmann 8c8c5ab
feat: Consider pp rank for model seed
rrutmann ab3daa0
fix: Only consider PP rank for seeding
rrutmann 62a1743
test: Add test for different parameters on tp/pp ranks
rrutmann 00a595b
test: Check for equal parameters across data parallel processes
rrutmann bf06da7
feat: Integrate seeding to model initialization
rrutmann b137701
refactor: Move seeding logic to model initialization component
rrutmann bff99f3
chore: Add seed and device_mesh to ComposedModelInitializationConfig
rrutmann 98ff9db
test: Adapt test to latest changes
rrutmann 2e248ed
chore: Remove old code
rrutmann 093fa33
chore: Merge branch 'main' into seed
rrutmann 5a9e89e
fix: Use local-generator weight init
rrutmann 13e7a82
refactor: Do not set seed in NNModel
rrutmann dc11bbb
docs: Add documentation and warning for topology-dependent weight ini…
rrutmann 999cb65
fix: Fix transformers version mismatch
rrutmann b02275f
test: Fix test by removing dependency on global RNG state for seed=None
rrutmann ddfbe47
test: Adapt test to latest changes in main
rrutmann 76762d9
chore: Use consistent typing for optional parameters
rrutmann dea2eef
chore: Remove outdated seed parameter
rrutmann adf11f0
fix: Use correct type for parameter_name_regexes
rrutmann 4cf0032
test: Add option for reliable vscode debugging
rrutmann 7541df2
test: Add test for seeded model reproducibility
rrutmann ede150e
chore: Change order of model initialization
rrutmann 67bc596
feat: Add multi_device_generator_policy for handling seeding with mul…
rrutmann 5172fc4
refactor: Use enum for multi_device_generator_policy
rrutmann 326823e
chore: Update model seed initialization
rrutmann File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this removed in transormers?
If it is part of a legacy API I think we should also remove this on our end.
What do you think @BlueCrescent? I think you added it, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function was removed in transformers version 5.2. In our pyproject.yaml we specify the requirement "transformers>=4.57.4,<5.0.0", so I used an unsupported transformers version here. Should we remove it just to be on the safe side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think, we should tackle the transformers 5.0.0+ support soon anyways.