This README provides a quick and practical guide for preparing data, configuring training, and running LoRA fine-tuning for Kandinsky models.
After clone this repo don't forget do:
git submodule update --init --remoteDownload all required pretrained models with kandinsky5/download_models.py and place them into:
kandinsky5/weights
Prepare a directory containing pairs:
*.mp4or*.png*.txt— caption for the same sample
Then:
- Open
encode/encode.sh - Set correct local paths for input data and output directories
- Run:
bash encode/encode.shThis will generate:
cache/latents_image/cache/text_embeds/
- T2I →
configs/lora_image.yaml - T2V / I2V →
configs/lora_video.yaml
Update in the selected config:
experiment_dirlog_dircheckpoint_dir
Then edit dataloader configs: configs/data/lora_*_dataloader.yaml.
Set:
latents_dir→ path to latents from Step 3text_embeds_dir→ path to text embeds from Step 3uncond_embed→text_embeds_dir+/null.pt
Edit:
configs/trainer/lora*.yaml
Configure:
devices→ number of GPUs- Optional: LoRA architecture parameters
Choose the correct config inside train.sh:
configs/lora_video.yamlfor T2V / I2Vconfigs/lora_image.yamlfor T2I
Correct --nproc_per_node on your number of GPUs and then run:
bash train.shNote: FSDP is enabled by default.