Figure: (a) A weight matrix is viewed as a set of neurons (red dots) on a hypersphere. (b) Current SOTA methods introduce perturbations (blue triangles) that interfere with the principal hyperspherical directions of pre-edit weights. (c) SPHERE projects new knowledge onto a sparse space complementary to the principal hyperspherical directions.
- 🔥 [2026.03] We release the pre-computed cov matrices for quick reproduction. See Download.
- 🔥 [2026.02] SPHERE is supported in EasyEdit.
- 🎉 [2026.01] SPHERE is accepted by ICLR 2026 (Score: 8884, Top-1.1% in Transfer/Meta/Lifelong Learning track).
- 🚀 [2025.09] SPHERE is released.
pip install torch==1.12.1
pip install einops==0.4.0 higher==0.2.1 hydra-core==1.2.0
pip install transformers==4.30.1 datasets==1.18.3
pip install matplotlib==3.6.1 spacy==3.4.1
pip install scipy==1.9.2 scikit-learn==1.0.2 nltk==3.7📋 Full dependency list
| Package | Version |
|---|---|
| pytorch | 1.12.1 |
| einops | 0.4.0 |
| higher | 0.2.1 |
| hydra-core | 1.2.0 |
| transformers | 4.30.1 |
| datasets | 1.18.3 |
| matplotlib | 3.6.1 |
| spacy | 3.4.1 |
| scipy | 1.9.2 |
| scikit-learn | 1.0.2 |
| nltk | 3.7 |
We provide the pre-computed cov matrix for both Llama3-8B-Instruct and Qwen2.5-7B-Instruct via Google Drive.
After downloading, decompress the file and place it under the ./data/stats directory.
Example: Editing Qwen2.5 (7B) on the CounterFact dataset using SPHERE
python3 -m experiments.evaluate \
--alg_name=AlphaEdit \
--model_name=./Qwen2.5-7B-Instruct \
--hparams_fname=Qwen2.5-7B.json \
--ds_name=mcf \
--dataset_size_limit=5000 \
--num_edits=100 \
--beta_hse=0.5 \
--alpha=0.5🔧 Argument details
| Argument | Description |
|---|---|
--alg_name |
Algorithm name (e.g., AlphaEdit) |
--model_name |
Path to the model (e.g., ./Qwen2.5-7B-Instruct) |
--hparams_fname |
Hyperparameter JSON file (e.g., Qwen2.5-7B.json) |
--ds_name |
Dataset name (e.g., mcf) |
--dataset_size_limit |
Total number of editing samples |
--num_edits |
Batch size for each round of editing |
--beta_hse |
Cumulative Ratio — top percentage of principal directions to suppress (e.g., 0.5 = top 50%) |
--alpha |
Suppression Strength — controls extent of perturbation removal along principal directions |
Tip
- To run the baseline, set
beta_hse=0. - To use SPHERE on MEMIT / PRUNE / RECT, set
beta_hse=0.5, alpha=0.8to reproduce paper results.
The edited weights from each run are stored as:
📂 Edited_Weight/
└── 📂 <alg_name>/
└── 📂 <model_name>/
├── 📁 <dataset>_weight_data_batch_<batch_size>_<beta_hse>_<alpha>/
├── 📁 <dataset>_weight_data_batch_<batch_size>_<beta_hse>_<alpha>/
└── ...
python3 -m scripts.evaluate_each_epoch \
--model_name=./Qwen2.5-7B-Instruct \
--weight_folder=./Edited_Weight/<alg_name>/<model_name>/<dataset>_weight_data_batch_<batch_size>_<beta_hse>_<alpha>/ \
--ds_name=mcf \
--dataset_size_limit=5000 \
--generation_test_interval=100🔧 Argument details
| Argument | Description |
|---|---|
--model_name |
Path to the model being evaluated |
--weight_folder |
Path to saved weights from previous editing |
--ds_name |
Dataset name (e.g., mcf) |
--dataset_size_limit |
Total number of evaluation samples |
--generation_test_interval |
Run test generation every N evaluation rounds |
📊 Results are saved to:
./Edited_Weight/<alg_name>/<model_name>/<dataset>_weight_data_batch_<...>/summary/summary.json
python3 -m scripts.evaluate_each_epoch \
--model_name=./Qwen2.5-7B-Instruct \
--weight_folder=./Edited_Weight/<alg_name>/<model_name>/<dataset>_weight_data_batch_<batch_size>_<beta_hse>_<alpha>/📊 Results are saved to:
./Edited_Weight/<alg_name>/<model_name>/<dataset>_weight_data_batch_<...>/rect_eval/
If you find this work useful, please cite our paper:
@inproceedings{liu2026energy,
title = {Energy-Regularized Sequential Model Editing on Hyperspheres},
author = {Liu, Qingyuan and Gu, Jia-Chen and Yao, Yunzhi and Wang, Hong and Peng, Nanyun},
booktitle = {The Fourteenth International Conference on Learning Representations},
year = {2026}
}Our code is built upon MEMIT, EMMET, and AlphaEdit. If you have any questions, feel free to reach out at ql2505(at)columbia.edu.
