π₯ Adversarial fine-tuning lab for small LLMs (1Bβ3B). Break models βοΈ, harden them π‘οΈ, and measure what actually improves π.
TΕki (ιΆε¨) β ceramic, shaped under pressure.
Models, like clay, only reveal their strength when stress-tested. TΕki is about forcing models through pressure β adversarial inputs β and reshaping them into something more robust.
TΕki is an end-to-end adversarial ML lab:
-
Generate adversarial prompts (jailbreaks, edge cases, failure modes)
-
Fine-tune models using LoRA / QLoRA (MLX or HuggingFace)
-
Evaluate robustness before and after training
-
Publish:
- adversarial datasets π¦
- hardened model weights π§
- evaluation reports π
LLMs are brittle.
- They fail under adversarial prompts
- They overfit to narrow behaviors
- Thereβs little systematic research on small model robustness
Most teams:
test a few prompts and call it βsafeβ
TΕki answers:
Do models actually get safer β or just better at passing tests?
- Adversarial ML & red-teaming
- LoRA / QLoRA fine-tuning
- Dataset construction & curation
- Robustness evaluation & benchmarking
- π¦ Rust CLI β orchestration, experiments, pipelines
- π Python core β training, generation, evaluation
git clone https://github.com/yourusername/toki.git
cd toki
cargo build
# Python core (no ML deps required for generate/evaluate/report/upload --dry-run)
cd python && pip install -e .
python -m toki generate --count 32 --output dataset.json
python -m toki evaluate --dataset dataset.json
python -m toki run --name baseline --output-dir experiments/runs
python -m toki report experiments/runs/<ts>_baseline/result.json --format both
# Continuous hardening loop (stops at convergence)
python -m toki pipeline \
--name harden_v1 \
--iterations 10 \
--convergence-threshold 0.95 \
--convergence-window 3
# A/B compare two models on the same adversarial dataset
# (paired t-test + Wilcoxon decide the winner at Ξ±=0.05)
python -m toki compare --model-a unsafe --model-b safe --name baseline_ab
# Publish to HuggingFace Hub (requires `pip install -e ".[hf]"`)
python -m toki upload \
--dataset dataset.json \
--repo your-username/toki-adversarial-v1 \
--version 0.4.0Break the model. Fix the model. Prove it.
If you want next step, I can: β unify all 4 under a Konjo umbrella README + architecture diagram (thatβs what really makes this pop in interviews)