Skip to content

amazon-science/multilingual-faithfulness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Multilingual Faithfulness

A framework for generating synthetic multilingual data to train faithfulness judges for text summarization.

Overview

This repository provides tools to:

  • Generate faithful and unfaithful summaries from multilingual datasets (WikiLingua)
  • Generate labeled training data for faithfulness judges using LLM-as-a-judge

Installation

Scripts run inside the official vLLM Docker container, which bundles compatible versions of vLLM, PyTorch, and Transformers.

docker pull vllm/vllm-openai:latest

Additional Python dependencies (installed inside the container):

pip install hydra-core omegaconf datasets

Project Structure

multilingual-faithfulness/
├── conf/                    # Hydra configuration files
│   ├── config.yaml          # Main configuration
│   └── task/                # Task-specific configs
│       ├── gen_data.yaml    # Training data generation
│       └── gen_summs.yaml   # Summary generation
├── data/                    # Benchmark datasets (CSV)
│   ├── llm_aggrefact.csv
│   ├── mface.csv
│   └── memerag.csv
├── scripts/                 # Executable scripts
│   ├── gen_data.py          # Training data generation
│   └── gen_summs.py         # Summary generation
├── src/                     # Library modules
│   ├── data_loader.py       # WikiLingua dataset loader
│   ├── gen_data.py          # Data generation functions
│   ├── gen_summs.py         # Summary generation functions
│   ├── corrupt.py           # Summary corruption strategies
│   ├── llm_inference/       # LLM inference utilities (vLLM)
│   └── utils/               # Helper functions and prompts
├── bash_files/              # Example shell scripts
└── requirements.txt

Usage

All scripts should be run inside the vLLM Docker container:

docker run --gpus all --rm \
  -v /path/to/repo:/workspace \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --ipc=host --entrypoint bash \
  vllm/vllm-openai:latest -c \
  "pip install hydra-core omegaconf datasets && \
   cd /workspace && \
   python3 scripts/<script>.py <args>"

1. Generate Summaries

Generate faithful and corrupted summaries from WikiLingua:

python3 scripts/gen_summs.py task=gen_summs \
    model.base_llm=Qwen/Qwen3-4B-Instruct-2507 \
    task.gen_summs.total_datapoints=14000 \
    vllm.num_gpus=4 \
    vllm.max_model_len=8192

2. Generate Training Data

Create labeled training data for the faithfulness judge:

python3 scripts/gen_data.py task=gen_data \
    model.base_llm=Qwen/Qwen3-4B-Instruct-2507 \
    task.data_gen.n_samples=1000 \
    task.data_gen.summaries_path=./output/data/corrupt_v2 \
    vllm.num_gpus=4 \
    vllm.max_model_len=8192

Citations

If you use this work, please cite:

@inproceedings{alfano2026multilingual,
  title     = {Multilingual Self-Taught Faithfulness Evaluators},
  author    = {Carlo Alfano and Aymen Al Marjani and Zeno Jonke and Amin Mantrach and Saab Mansour and Marcello Federico},
  year      = {2026},
  booktitle = {Findings of the Association for Computational Linguistics: EACL 2026}
}

Security

See CONTRIBUTING for more information.

License

This library is licensed under the CC-BY-4.0 License. See the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages