GitHub - AI-Plans/Model-Diffing

KTO Trainer

Things to be mindful of for KTOTraining

KTOTrainer does not work for latest versions of torch( 2.8.0 onwards). When downgrading torch, we need to downgrade torchvision and torchtext as well. Thankfully, You can change the runtime version in GPU selection section of colab. Choose the penultimate version. If you are working in Kaggle, no changes are needed.
KTOTrainer takes a lot of space . Refer-https://discord.com/channels/879548962464493619/879548962464493622/1440675714419523585 (Huggingface discord channel)
To fix the above problem, refer -https://huggingface.co/datasets/John6666/forum2/blob/main/trl_kto_blow_up_memory_1.md

If you are facing a zip error, please try updating all of the libraries. There was a bug related to strict parameter for zip func, which is fixed

Zero Loss shown by GRPO is not a problem. Use Wandb to record metrics. If it shows gradient , then it is fine.

Refer:

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
Crosscoder		Crosscoder
DPOTrainer		DPOTrainer
DPRO-Trainer		DPRO-Trainer
IPO		IPO
KTO-Trainer		KTO-Trainer
ORPOTrainer		ORPOTrainer
PPO code		PPO code
RedDebate		RedDebate
Reward Models		Reward Models
dpo-qwen0-6b-fft		dpo-qwen0-6b-fft
sft		sft
README.md		README.md