Links: 🌐 Website | 📄 Paper | 💾 Data | ⚖️ Model Weights
CleverBirds is a large-scale knowledge tracing benchmark for fine-grained bird species recognition, collected via the eBird citizen-science platform, containing over 17 million multiple-choice questions from 40,000+ participants across 10,000+ bird species, with an average of 400 questions per participant.
code/- Main experiment codeconfigs/- Training and evaluation configuration filesnotebooks/- Analysis and plotting notebooksscripts/- Training and evaluation scriptssrc/- Core implementation code
mappings/- Species-to-token mappings and eBird taxonomy filespykt_fork/- Modified pyKT library for knowledge tracing baselines
Set up a conda environment and install dependencies:
cd code
conda create -n machine_teaching python=3.11
conda activate machine_teaching
conda install pip
pip install uv
uv pip install -r requirements.txtDownload the dataset from HuggingFace and preprocess it for training:
cd code/scripts/preprocessing/processing_cleaned/
# Step 1: Download dataset
python3 step_0_load_from_hf.py --save_path PATH_TO_DATASET
# Step 2: Reorder images
python3 step_1_reorder_imgs.py --local_dir PATH_TO_DATASET --testing False
# Step 3: Add feature columns
python3 step_2_add_feature_columns.py \
--local_dir PATH_TO_DATASET \
--logging_dir ./logs \
--mappings_path ../../mappings \
--testing FalseReplace PATH_TO_DATASET with your desired output directory. The preprocessing creates:
participant_data/- User interaction data (train/val/test splits)image_features/- Pre-extracted image featuresreordered_imgs/- Reordered image files
Participant Data (participant_data/train.parquet, val.parquet, test.parquet):
question_id- Question identifierquiz_id- Quiz identifier (20 questions per quiz)asset_id- Image identifieruser_id- Anonymized user identifieruser_interact_count- Chronological question index per userspecies_code- Correct bird species codechoices- List of candidate choices (last entry is "None of the Above")user_answer- User's selected answercorrect- Binary correctness indicatorlabels- Answer index (4 for NOTA)hex3- Hex3 aggregated locationweek- Calendar week
Image Features (image_features/chunk_IDX.npz):
- NPZ files containing pre-extracted image features
- Match to participant data using
asset_id
Training configurations are in code/configs/. To train a model:
cd code/scripts
python3 train.py \
--base_config ../configs/base/base.yaml \
--experiment_config ../configs/baselines_dino.yamlWe provide sample configurations, exact model configs can be found in the model weights files. Example configuration files:
base/base.yaml- Base configurationbase/base_species_pred.yaml- Species prediction baselineincontext_deepseek_readable.yaml- In-context learning with DeepSeekincontext_openai_readable_stats.yaml- In-context learning with OpenAIbaselines_dino.yaml- DINO baselines2s/t5_tiny_eff.yaml- Sequence-to-sequence T5 modelmc/tinybert.yaml- Multiple choice BERT model
Evaluate trained models:
cd code/scripts
python3 eval.py \
--base_config ../configs/base/base.yaml \
--experiment_config ../configs/incontext_deepseek_readable.yamlNote: Scripts use wandb for experiment tracking. Ensure wandb is configured or disable it in the config files.
Baseline knowledge tracing models (DKT, simpleKT, AKT, etc.) are implemented using a modified version of pyKT.
Pre-trained model weights are available on HuggingFace.
If you use this code or dataset, please cite the CleverBirds paper:
inproceedings{bossemeyercleverbirds,
title={CleverBirds: A Multiple-Choice Benchmark for Fine-grained Human Knowledge Tracing},
author={Bossemeyer, Leonie and Heinrich, Samuel and Van Horn, Grant and Mac Aodha, Oisin},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track}
}