Skip to content

konjoai/toki

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ¦€ Tōki

Language ML License Status

πŸ”₯ Adversarial fine-tuning lab for small LLMs (1B–3B). Break models βš”οΈ, harden them πŸ›‘οΈ, and measure what actually improves πŸ“Š.


🏺 Meaning

Tōki (院器) β€” ceramic, shaped under pressure.

Models, like clay, only reveal their strength when stress-tested. Tōki is about forcing models through pressure β€” adversarial inputs β€” and reshaping them into something more robust.


πŸš€ What it is

Tōki is an end-to-end adversarial ML lab:

  • Generate adversarial prompts (jailbreaks, edge cases, failure modes)

  • Fine-tune models using LoRA / QLoRA (MLX or HuggingFace)

  • Evaluate robustness before and after training

  • Publish:

    • adversarial datasets πŸ“¦
    • hardened model weights 🧠
    • evaluation reports πŸ“Š

❗ The problem

LLMs are brittle.

  • They fail under adversarial prompts
  • They overfit to narrow behaviors
  • There’s little systematic research on small model robustness

Most teams:

test a few prompts and call it β€œsafe”

Tōki answers:

Do models actually get safer β€” or just better at passing tests?


🧠 What you learn

  • Adversarial ML & red-teaming
  • LoRA / QLoRA fine-tuning
  • Dataset construction & curation
  • Robustness evaluation & benchmarking

βš™οΈ Architecture

  • πŸ¦€ Rust CLI β€” orchestration, experiments, pipelines
  • 🐍 Python core β€” training, generation, evaluation

πŸš€ Quick Start

git clone https://github.com/yourusername/toki.git
cd toki
cargo build

# Python core (no ML deps required for generate/evaluate/report/upload --dry-run)
cd python && pip install -e .
python -m toki generate --count 32 --output dataset.json
python -m toki evaluate --dataset dataset.json
python -m toki run --name baseline --output-dir experiments/runs
python -m toki report experiments/runs/<ts>_baseline/result.json --format both

# Continuous hardening loop (stops at convergence)
python -m toki pipeline \
  --name harden_v1 \
  --iterations 10 \
  --convergence-threshold 0.95 \
  --convergence-window 3

# A/B compare two models on the same adversarial dataset
# (paired t-test + Wilcoxon decide the winner at Ξ±=0.05)
python -m toki compare --model-a unsafe --model-b safe --name baseline_ab

# Publish to HuggingFace Hub (requires `pip install -e ".[hf]"`)
python -m toki upload \
  --dataset dataset.json \
  --repo your-username/toki-adversarial-v1 \
  --version 0.4.0

🎯 Vision

Break the model. Fix the model. Prove it.

If you want next step, I can: β†’ unify all 4 under a Konjo umbrella README + architecture diagram (that’s what really makes this pop in interviews)

About

πŸ¦€ Toki β€” Adversarial fine-tuning lab for small models (1B–3B). Generate red-team attacks βš”οΈ, harden via LoRA 🧩, and ship datasets + robust weights πŸ“¦. Explores whether robustness truly generalizes or just overfits under pressure πŸ”₯πŸ“Š

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors