VEFX-Bench

Benchmarking Generic Video Editing and Visual Effects

📄 Paper • 💻 Code • 🤗 Dataset • 🤗 Model (4B) • 🏆 Leaderboard • 🌐 Project Page

VEFX-Bench is a comprehensive benchmark for evaluating text-driven video editing and visual effects. It includes 5,049 annotated examples spanning 9 categories and 32 subcategories, evaluated by VEFX-Reward — a VLM-based reward model that scores edits across three dimensions on a 1–4 scale:

Dimension	What it measures
Instructional Following (IF)	Does the edit accurately reflect the editing instruction?
Render Quality (RQ)	Visual clarity, temporal consistency, and physical plausibility
Edit Exclusivity (EE)	Were only the intended regions modified, without side-effects?

🏆 Model Leaderboard

VEFX-Reward scores on 1–4 scale. Ranked by GeoAgg (α=2 for IF, β=1 for RQ, γ=1 for EE). Higher is better.

📅 Updated: May 2, 2026 — For the latest results & submissions, visit the live leaderboard →

Rank	Model	Type	IF ↑	RQ ↑	EE ↑	GeoAgg ↑
🥇	Kling o3 Omni	Commercial	3.033	3.588	3.043	3.057
🥈	Kling o1	Commercial	3.040	3.534	2.976	2.985
🥉	Runway Gen-4.5	Commercial	2.817	3.319	2.923	2.912
4	Seedance 2.0	Commercial	2.811	3.421	3.088	2.766
5	Grok Imagine	Commercial	2.606	3.346	3.376	2.723
6	Luma Ray 3	Commercial	2.702	3.403	2.705	2.717
7	UniVideo	Open-source	2.294	3.266	3.091	2.516
8	Wan 2.6	Commercial	2.012	3.317	2.446	2.146
9	Luma Ray 2	Commercial	2.038	2.532	1.363	1.804
10	VACE	Open-source	2.027	3.172	1.180	1.775

🎬 Demo Videos

Each demo shows the original video (left) alongside the edited video (right).

Attribute Change _{"Change the color of the red industrial trailer to a bright yellow while maintaining the texture and appearance of the metal surface."}	Object Removal _{"Remove the woman with the grey backpack walking on the right side of the frame."}

Style Transfer _{"Restore the natural, realistic colors to the entire scene, replacing the current black and white style with a full-color rendition."}	Camera Motion _{"Perform a smooth zoom in on the distant snowy mountain peaks to create a more immersive view."}

📊 Benchmark at a Glance


📝 5,049 Annotated Examples	🎬 1,419 Source Videos
📂 9 / 32 Categories / Subcategories	🤖 10 Editing Systems
📐 3 Quality Dimensions (IF, RQ, EE)	🧪 300 Benchmark Test Pairs

🤗 VEFX-Reward Models

Model	Backbone	Params	HuggingFace	Status
VEFX-Reward-4B	Qwen3-VL-4B-Instruct	4B	xiangbog/VEFX-Reward-4B	✅ Available
VEFX-Reward-32B	Qwen3-VL-32B-Instruct	32B	TBD	🔜 Coming soon

🚀 Quick Start

Installation

conda create -n vefx-bench python=3.10 -y
conda activate vefx-bench

# Install PyTorch first (match your CUDA version)
# See https://pytorch.org/get-started/locally/ for the right command
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124

# Install remaining dependencies
pip install -r requirements.txt

# Install the package
pip install -e .

Requirements: Python ≥ 3.10, CUDA GPU, ~10 GB VRAM (bfloat16). Make sure your PyTorch CUDA version matches your driver.

Score a Video Edit (Python API)

from vefx_reward import VEFXReward

model = VEFXReward("xiangbog/VEFX-Reward-4B", device="cuda")

scores = model.score(
    original_video="examples/sample_videos/object_removal_original.mp4",
    edited_video="examples/sample_videos/object_removal_edited.mp4",
    instruction="Remove the woman with the grey backpack walking on the right side of the frame.",
)
print(scores)
# {'IF': 2.34, 'RQ': 1.93, 'EE': 1.82, 'Overall': 6.09}

CLI Usage

python examples/quick_start.py \
    --original examples/sample_videos/object_removal_original.mp4 \
    --edited examples/sample_videos/object_removal_edited.mp4 \
    --instruction "Remove the woman with the grey backpack walking on the right side of the frame."

Score All Included Samples

The repo includes 4 sample video pairs with prompts. Score them all:

import json
from vefx_reward import VEFXReward

model = VEFXReward("xiangbog/VEFX-Reward-4B", device="cuda")

with open("examples/sample_videos/prompts.json") as f:
    samples = json.load(f)

for sample in samples:
    scores = model.score(
        original_video=f"examples/sample_videos/{sample['original']}",
        edited_video=f"examples/sample_videos/{sample['edited']}",
        instruction=sample["instruction"],
    )
    print(f"[{sample['category']}] IF={scores['IF']:.2f}  RQ={scores['RQ']:.2f}  EE={scores['EE']:.2f}")

Batch Scoring

Prepare a CSV with columns original_video, edited_video, instruction:

python examples/batch_scoring.py --csv edits.csv --output results.csv

Multi-GPU Scoring

For large-scale evaluation across multiple GPUs:

python examples/multi_gpu_scoring.py --csv edits.csv --num_gpus 4 --output results.csv

📖 API Reference

`VEFXReward`

VEFXReward(
    model_path="xiangbog/VEFX-Reward-4B",  # HuggingFace ID or local path
    device="cuda",                           # "cuda", "cuda:0", "cpu"
    dtype=torch.bfloat16,                    # torch.bfloat16 or torch.float16
    fps=4.0,                                 # Video sampling rate
    max_frame_pixels=399360,                 # Max pixels per frame
)

`model.score(original_video, edited_video, instruction) → dict`

Score a single video edit. Returns {'IF': float, 'RQ': float, 'EE': float, 'Overall': float}.

`model.score_batch(original_videos, edited_videos, instructions) → list[dict]`

Score multiple edits sequentially. Each sample is processed independently to avoid OOM.

📝 Citation

@article{gao2025vefxbench,
  title={VEFX-Bench: Benchmarking Generic Video Editing and Visual Effects},
  author={Xiangbo Gao and Sicong Jiang and Bangya Liu and Xinghao Chen and Minglai Yang and Siyuan Yang and Mingyang Wu and Jiongze Yu and Qi Zheng and Haozhi Wang and Jiayi Zhang and Jared Yang and Jie Yang and Zihan Wang and Qing Yin and Zhengzhong Tu},
  journal={arXiv preprint arXiv:2604.16272},
  year={2026}
}

License

This project is licensed under the Apache License 2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
examples		examples
vefx_reward		vefx_reward
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VEFX-Bench

Benchmarking Generic Video Editing and Visual Effects

🏆 Model Leaderboard

🎬 Demo Videos

📊 Benchmark at a Glance

🤗 VEFX-Reward Models

🚀 Quick Start

Installation

Score a Video Edit (Python API)

CLI Usage

Score All Included Samples

Batch Scoring

Multi-GPU Scoring

📖 API Reference

`VEFXReward`

`model.score(original_video, edited_video, instruction) → dict`

`model.score_batch(original_videos, edited_videos, instructions) → list[dict]`

📝 Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VEFX-Bench

Benchmarking Generic Video Editing and Visual Effects

🏆 Model Leaderboard

🎬 Demo Videos

📊 Benchmark at a Glance

🤗 VEFX-Reward Models

🚀 Quick Start

Installation

Score a Video Edit (Python API)

CLI Usage

Score All Included Samples

Batch Scoring

Multi-GPU Scoring

📖 API Reference

VEFXReward

model.score(original_video, edited_video, instruction) → dict

model.score_batch(original_videos, edited_videos, instructions) → list[dict]

📝 Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`VEFXReward`

`model.score(original_video, edited_video, instruction) → dict`

`model.score_batch(original_videos, edited_videos, instructions) → list[dict]`

Packages