Formal specification and verification capabilities for rlm-core
Historical design/planning artifact:
- Checklist lines marked
[historical target]are archival, not active implementation backlog.- For live execution status and priorities, use
bd statusand:
docs/execution-plan/STATUS.mddocs/execution-plan/TASK-REGISTRY.mddocs/execution-plan/WORKBOARD.md
This document proposes two interconnected capabilities:
- Lean REPL - A Lean 4 REPL integrated with rlm-core for interactive theorem proving, spec validation, and implementation verification
- Spec Agent - An AI agent specialized in creating, refining, and formalizing specifications, with first-class Topos integration
Together, these enable a dual-track specification workflow:
- Topos (semantic contracts) - Human-readable intent, traceability, evidence
- Lean (formal specs) - Machine-verifiable types, invariants, proofs
Current specification approaches have a gap:
- Natural language specs are ambiguous, can't be verified
- Formal specs are precise but hard to write and maintain
- AI-generated code may not match intent without verification
┌─────────────────────────────────────────────────────────────────────────────┐
│ DUAL-TRACK SPECIFICATION │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Natural Language Spec Agent Topos Spec Lean Formalization │
│ Intent ───────────────► (.tps) ◄────► (.lean) │
│ (bidirectional) │
│ │ │ │
│ │ │ │
│ ▼ ▼ │
│ Implementation Proof Checking │
│ (Code) (Type Checker) │
│ │ │ │
│ ▼ ▼ │
│ Evidence ◄──────────► Verified │
│ (git, tests) Properties │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Key insight: LLMs are well-suited for proof generation because invalid proofs are rejected by the type checker. The challenge shifts to writing good specifications, which is exactly what the Spec Agent addresses.
[REQ-1] User requirement (Topos)
↓
[BEH-1] Behavior contract (Topos)
↓ implements
[SPEC-01.01] Lean type/theorem reference
↓ formalizes
theorem create_order_preserves_inventory : ∀ order, valid_order order → ...
↓ proves
[TASK-1] Implementation task (Topos)
↓ evidence
PR #123, commit abc123, 95% coverage
The Lean REPL follows the same subprocess + JSON-RPC pattern as the Python REPL:
┌─────────────┐ JSON-RPC ┌─────────────────────────┐
│ rlm-core │◄───────────►│ Lean 4 REPL │
│ (Rust) │ stdin/out │ (leanprover-community) │
└─────────────┘ └─────────────────────────┘
│
▼
┌─────────────────────────┐
│ Mathlib / Custom Libs │
│ (lake dependencies) │
└─────────────────────────┘
Backend: leanprover-community/repl
- JSON protocol over stdin/stdout
- Environment pickling (save/restore proof states)
- Both command mode and tactic mode
// Extension of existing ReplEnvironment trait
pub struct LeanRepl {
process: Child,
reader: BufReader<ChildStdout>,
writer: BufWriter<ChildStdin>,
env_counter: u64, // Track Lean environments
project_root: PathBuf,
}
impl ReplEnvironment for LeanRepl {
fn execute(&mut self, code: &str) -> Result<ExecuteResult> {
// Send JSON command to Lean REPL
let request = LeanCommand::Command {
cmd: code.to_string(),
env: self.current_env,
};
self.send_json(&request)?;
// Parse response
let response: LeanResponse = self.read_json()?;
Ok(ExecuteResult {
success: response.sorries.is_empty() && response.messages.iter()
.all(|m| m.severity != "error"),
stdout: response.format_output(),
stderr: response.format_errors(),
metadata: Some(json!({
"env": response.env,
"sorries": response.sorries,
"goals": response.goals,
})),
})
}
fn get_pending_operations(&self) -> Vec<String> {
// Sorries and holes become pending operations
self.sorries.iter()
.map(|s| format!("sorry:{}", s.goal))
.collect()
}
fn resolve_operation(&mut self, id: &str, result: Value) -> Result<()> {
// Apply tactic or proof provided by AI
if id.starts_with("sorry:") {
let tactic = result.as_str().ok_or(Error::InvalidTactic)?;
self.apply_tactic(tactic)?;
}
Ok(())
}
}Commands (from leanprover-community/repl):
| Command Type | Fields | Purpose |
|---|---|---|
command |
cmd, env? |
Execute Lean command |
tactic |
tactic, proofState |
Apply tactic in proof mode |
pickle |
path, env |
Save environment to file |
unpickle |
path |
Restore environment |
Response Fields:
| Field | Type | Description |
|---|---|---|
env |
number |
Environment ID (for backtracking) |
messages |
array |
Compiler messages (info, warning, error) |
sorries |
array |
Unfinished goals with positions |
goals |
array? |
Proof goals in tactic mode |
Lean requires project context (mathlib, dependencies). The REPL manager handles this:
pub struct LeanProjectManager {
/// Active project configurations
projects: HashMap<PathBuf, LeanProject>,
}
pub struct LeanProject {
root: PathBuf,
lakefile: LakefileConfig,
lean_version: String,
/// Built packages available for import
packages: Vec<String>,
}
impl LeanProjectManager {
/// Initialize or get existing project
pub async fn get_or_create(&mut self, path: &Path) -> Result<&LeanProject> {
if !self.projects.contains_key(path) {
// Check for existing lakefile.lean/lakefile.toml
let project = if path.join("lakefile.lean").exists() {
LeanProject::from_existing(path)?
} else {
// Create new project with defaults
LeanProject::create_new(path, LeanProjectTemplate::Mathlib)?
};
// Build dependencies
project.lake_build().await?;
self.projects.insert(path.to_path_buf(), project);
}
Ok(self.projects.get(path).unwrap())
}
}/// Proof state that can be saved/restored
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ProofState {
/// Environment ID from Lean REPL
env: u64,
/// Current proof goals
goals: Vec<Goal>,
/// Proof steps taken
history: Vec<ProofStep>,
/// Checkpoint file path (for persistence)
checkpoint: Option<PathBuf>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Goal {
/// Goal type signature
target: String,
/// Local hypotheses
hyps: Vec<Hypothesis>,
/// Suggested tactics (AI-generated)
suggestions: Vec<TacticSuggestion>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ProofStep {
tactic: String,
/// Goals before
pre_goals: Vec<String>,
/// Goals after
post_goals: Vec<String>,
/// Time taken
elapsed_ms: u64,
}pub enum TrajectoryEventType {
// ... existing events ...
// Lean-specific events
LeanExec, // Lean command execution
LeanResult, // Lean execution result
ProofStart, // Begin proof attempt
TacticApply, // Tactic application
GoalChange, // Proof goals changed
SorryFound, // Unfinished proof detected
ProofComplete, // All goals discharged
TypeCheck, // Type checking result
LemmaExtract, // Lemma extracted from proof
}The Spec Agent is a specialized agent for specification creation and refinement:
┌─────────────────────────────────────────────────────────────────────────────┐
│ SPEC AGENT │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Intake │───►│ Refine │───►│ Formalize │ │
│ │ │ │ │ │ │ │
│ │ • NL parsing │ │ • Clarify │ │ • Topos gen │ │
│ │ • Intent │ │ • Decompose │ │ • Lean gen │ │
│ │ extraction │ │ • Validate │ │ • Sync │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ └──────────────────┼───────────────────┘ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Verify │ │
│ │ │ │
│ │ • Type check │ │
│ │ • Prove │ │
│ │ • Validate │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
| Capability | Description | Tools |
|---|---|---|
| Intake | Parse natural language requirements into structured intent | NLP, LLM |
| Refine | Iterative clarification with human feedback | AskUserQuestion, Memory |
| Formalize (Topos) | Generate Topos specs (.tps files) | Topos MCP tools |
| Formalize (Lean) | Generate Lean types, theorems, proofs | Lean REPL |
| Verify | Type-check Lean specs, attempt proofs | Lean REPL |
| Sync | Keep Topos ↔ Lean in sync | Custom sync engine |
pub struct SpecAgentConfig {
/// Default formalization level
pub default_level: FormalizationLevel,
/// Domains to prioritize (affects library imports)
pub domains: Vec<Domain>,
/// Proof strategy
pub proof_strategy: ProofStrategy,
/// Topos integration
pub topos_enabled: bool,
pub topos_project_root: Option<PathBuf>,
/// Lean project settings
pub lean_project_root: Option<PathBuf>,
pub lean_libraries: Vec<String>,
}
#[derive(Debug, Clone, Copy)]
pub enum FormalizationLevel {
/// Types only - domain models, enums, structs
Types,
/// Types + invariants - type-level constraints
TypesWithInvariants,
/// Pre/post conditions - function contracts
Contracts,
/// Full proofs - complete correctness proofs
FullProofs,
}
#[derive(Debug, Clone, Copy)]
pub enum ProofStrategy {
/// decide, simp, omega, then AI, then human
Progressive,
/// Always try AI proof search first
AIFirst,
/// Draft outline, human completes
HumanInLoop,
}
#[derive(Debug, Clone)]
pub enum Domain {
AlgorithmsDataStructures,
DistributedSystems,
APIsProtocols,
SecurityProperties,
ApplicationFlow,
}┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 1: INTAKE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ User: "I need an order management system that ensures inventory │
│ is never oversold, supports partial fulfillment, and │
│ maintains audit trails for compliance." │
│ │
│ ▼ │
│ │
│ Spec Agent extracts: │
│ • Domain: E-commerce / Inventory │
│ • Key entities: Order, Inventory, AuditLog │
│ • Invariants: inventory >= 0, no overselling │
│ • Behaviors: create_order, fulfill_order, partial_fulfill │
│ • Properties: audit_complete (all changes logged) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 2: REFINE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Spec Agent asks clarifying questions: │
│ │
│ Q1: "How should partial fulfillment work?" │
│ • Split into multiple shipments │
│ • Backorder remaining items │
│ • Cancel unfulfillable items │
│ │
│ Q2: "What audit events need to be captured?" │
│ • All state changes │
│ • Only fulfillment events │
│ • Configurable per order type │
│ │
│ Q3: "Are there concurrency requirements?" │
│ • Single-writer (queue-based) │
│ • Optimistic locking │
│ • Serializable transactions │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 3: FORMALIZE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Generates Topos spec (human-readable): │
│ │
│ ```topos │
│ spec OrderManagement │
│ │
│ ## Concepts │
│ │
│ Order: │
│ id: `OrderId` │
│ items: list of `OrderItem` │
│ status: `pending` | `partial` | `fulfilled` | `cancelled` │
│ invariant: items.all(i => i.quantity > 0) │
│ @lean: Order.lean#Order │
│ │
│ ## Behaviors │
│ │
│ create_order: │
│ implements: REQ-1 │
│ given: valid order request │
│ returns: `Order` with status `pending` │
│ ensures: inventory reserved for all items │
│ @lean: Order.lean#create_order_spec │
│ ``` │
│ │
│ AND generates Lean formalization (machine-verifiable): │
│ │
│ ```lean │
│ -- Order.lean │
│ import Mathlib.Data.List.Basic │
│ │
│ structure OrderItem where │
│ productId : Nat │
│ quantity : Nat │
│ quantity_pos : quantity > 0 │
│ │
│ structure Order where │
│ id : Nat │
│ items : List OrderItem │
│ status : OrderStatus │
│ │
│ -- Theorem: creating an order reserves inventory │
│ theorem create_order_reserves_inventory │
│ (inv : Inventory) (req : OrderRequest) │
│ (h_valid : valid_request inv req) : │
│ let (order, inv') := create_order inv req │
│ ∀ item ∈ order.items, │
│ inv'.reserved item.productId ≥ item.quantity := by │
│ sorry -- [?] AI will attempt proof │
│ ``` │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Topos specs reference Lean artifacts via @lean annotations:
## Concepts
Order:
id: `OrderId`
items: list of `OrderItem`
status: `pending` | `partial` | `fulfilled` | `cancelled`
invariant: items.all(i => i.quantity > 0)
@lean: specs/Order.lean#Order # Links to Lean structure
@lean.invariant: specs/Order.lean#Order.items_nonempty # Links to theorem
Lean files include reverse links via comments:
/--
@topos: OrderManagement.tps#Order
Order represents a customer order with line items.
-/
structure Order where
id : Nat
items : List OrderItem
status : OrderStatus
items_pos : ∀ item ∈ items, item.quantity > 0 -- @topos: Order.invariantpub struct DualTrackSync {
topos_project: PathBuf,
lean_project: PathBuf,
link_index: LinkIndex,
}
#[derive(Debug)]
pub struct LinkIndex {
/// Topos artifact → Lean artifact
topos_to_lean: HashMap<ToposRef, LeanRef>,
/// Lean artifact → Topos artifact
lean_to_topos: HashMap<LeanRef, ToposRef>,
}
impl DualTrackSync {
/// Check for drift between Topos and Lean
pub async fn check_drift(&self) -> Result<DriftReport> {
let mut report = DriftReport::new();
for (topos_ref, lean_ref) in &self.link_index.topos_to_lean {
// Parse both artifacts
let topos_artifact = self.parse_topos(topos_ref)?;
let lean_artifact = self.parse_lean(lean_ref)?;
// Compare semantically
if let Some(drift) = self.compare_artifacts(&topos_artifact, &lean_artifact)? {
report.add_drift(topos_ref.clone(), lean_ref.clone(), drift);
}
}
// Check for unlinked artifacts
report.unlinked_topos = self.find_unlinked_topos()?;
report.unlinked_lean = self.find_unlinked_lean()?;
Ok(report)
}
/// Propagate changes from Topos to Lean
pub async fn sync_topos_to_lean(&self, changes: &[ToposChange]) -> Result<Vec<LeanChange>> {
let mut lean_changes = Vec::new();
for change in changes {
match change {
ToposChange::ConceptAdded(concept) => {
// Generate Lean structure
let lean_struct = self.generate_lean_structure(concept)?;
lean_changes.push(LeanChange::StructureAdded(lean_struct));
}
ToposChange::InvariantAdded(inv) => {
// Generate Lean theorem
let lean_theorem = self.generate_lean_theorem(inv)?;
lean_changes.push(LeanChange::TheoremAdded(lean_theorem));
}
ToposChange::BehaviorAdded(behavior) => {
// Generate Lean function spec
let lean_spec = self.generate_lean_function_spec(behavior)?;
lean_changes.push(LeanChange::FunctionSpecAdded(lean_spec));
}
// ... other change types
}
}
Ok(lean_changes)
}
}The sync engine supports progressive formalization:
| Level | Topos | Lean |
|---|---|---|
| Types | Concepts with fields | structure definitions |
| Types + Invariants | Concepts with invariants | Structures with proofs in fields |
| Contracts | Behaviors with pre/post | Function specs with requires/ensures |
| Full Proofs | All of above | Complete theorems with proofs |
impl DualTrackSync {
pub fn set_formalization_level(&mut self, level: FormalizationLevel) {
self.level = level;
}
fn generate_lean_for_concept(&self, concept: &Concept) -> Result<String> {
match self.level {
FormalizationLevel::Types => {
// Just structure definition
self.generate_basic_structure(concept)
}
FormalizationLevel::TypesWithInvariants => {
// Structure with proof fields
self.generate_structure_with_proofs(concept)
}
FormalizationLevel::Contracts | FormalizationLevel::FullProofs => {
// Full specification
self.generate_full_spec(concept)
}
}
}
}┌─────────────────────────────────────────────────────────────────────────────┐
│ PROOF AUTOMATION PIPELINE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ TIER 1: Decidable (instant) │
│ ───────────────────────── │
│ • decide - boolean decidability │
│ • native_decide - native computation │
│ • omega - linear arithmetic │
│ • simp - simplification with lemmas │
│ │
│ │ if fails │
│ ▼ │
│ │
│ TIER 2: Automation (seconds) │
│ ──────────────────────────── │
│ • aesop - extensible automation │
│ • linarith - linear arithmetic reasoning │
│ • ring - ring algebra │
│ • polyrith - polynomial arithmetic (with LLM hints) │
│ │
│ │ if fails │
│ ▼ │
│ │
│ TIER 3: AI-Assisted (seconds-minutes) │
│ ───────────────────────────────────── │
│ • LLM generates tactic sequence │
│ • Validates against type checker │
│ • Retries with different strategies │
│ • Uses LeanDojo / DeepSeek-Prover patterns │
│ │
│ │ if fails │
│ ▼ │
│ │
│ TIER 4: Human-in-Loop │
│ ───────────────────── │
│ • Mark with sorry + TODO │
│ • Create task in beads │
│ • Provide proof outline for human │
│ • Store in memory for future reference │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
pub struct AIProofAssistant {
llm_client: Arc<dyn LLMClient>,
lean_repl: Arc<Mutex<LeanRepl>>,
strategy_memory: Arc<MemoryStore>,
}
impl AIProofAssistant {
/// Attempt to prove a theorem
pub async fn prove(&self, theorem: &str, context: &ProofContext) -> Result<ProofResult> {
// Step 1: Try decidable tactics
if let Some(proof) = self.try_decidable(theorem).await? {
return Ok(ProofResult::Proved(proof));
}
// Step 2: Try automation tactics
if let Some(proof) = self.try_automation(theorem, context).await? {
return Ok(ProofResult::Proved(proof));
}
// Step 3: AI-assisted proof search
for attempt in 0..self.config.max_ai_attempts {
let tactics = self.generate_tactics(theorem, context, attempt).await?;
match self.validate_tactics(&tactics).await? {
ValidationResult::Valid(proof) => {
// Store successful strategy
self.strategy_memory.store_strategy(theorem, &proof).await?;
return Ok(ProofResult::Proved(proof));
}
ValidationResult::Partial { progress, remaining_goals } => {
// Update context with progress
context.add_progress(progress);
context.set_goals(remaining_goals);
}
ValidationResult::Invalid(errors) => {
// Learn from failure
context.add_failed_attempt(&tactics, &errors);
}
}
}
// Step 4: Give up, create TODO
Ok(ProofResult::Sorry {
outline: self.generate_proof_outline(theorem, context).await?,
task_id: self.create_proof_task(theorem, context).await?,
})
}
async fn generate_tactics(&self, theorem: &str, context: &ProofContext, attempt: u32) -> Result<Vec<String>> {
let prompt = format!(
r#"You are a Lean 4 theorem prover. Generate tactics to prove:
```lean
{theorem}Context:
- Available lemmas: {lemmas}
- Similar proved theorems: {similar}
- Previous failed attempts: {failed}
Attempt {attempt}. Generate a sequence of tactics, one per line.
If you need a helper lemma, use have to introduce it.
"#,
theorem = theorem,
lemmas = context.available_lemmas.join(", "),
similar = context.similar_proofs.join("\n"),
failed = context.failed_attempts.join("\n"),
attempt = attempt,
);
let response = self.llm_client.complete(&prompt).await?;
Ok(self.parse_tactics(&response))
}
}
### 5.3 Domain-Specific Strategies
```rust
/// Domain-specific proof strategies
pub fn get_domain_strategies(domain: Domain) -> Vec<ProofStrategy> {
match domain {
Domain::AlgorithmsDataStructures => vec![
ProofStrategy::Induction, // Structural induction
ProofStrategy::WellFoundedRecursion,
ProofStrategy::CaseAnalysis,
],
Domain::DistributedSystems => vec![
ProofStrategy::Bisimulation, // State machine equivalence
ProofStrategy::Invariant, // Inductive invariants
ProofStrategy::Refinement, // Refinement mappings
],
Domain::APIsProtocols => vec![
ProofStrategy::StateTransition, // Pre/post conditions
ProofStrategy::SessionTypes, // Protocol compliance
ProofStrategy::ResourceAccounting, // Linear types
],
Domain::SecurityProperties => vec![
ProofStrategy::InformationFlow, // Non-interference
ProofStrategy::AccessControl, // Authorization
ProofStrategy::Cryptographic, // Game-based proofs
],
Domain::ApplicationFlow => vec![
ProofStrategy::Termination, // Liveness
ProofStrategy::Safety, // Invariant preservation
ProofStrategy::Fairness, // Progress properties
],
}
}
Every Lean theorem/lemma traces to a [SPEC-XX.YY] identifier:
## Requirements
REQ-1: Order Creation
when: valid order request received
the system shall: create order with reserved inventory
@spec: SPEC-01.01
@lean: specs/Order.lean#create_order_reserves_inventory
/--
@spec: SPEC-01.01
@topos: OrderManagement.tps#REQ-1
-/
theorem create_order_reserves_inventory
(inv : Inventory) (req : OrderRequest)
(h_valid : valid_request inv req) :
let (order, inv') := create_order inv req
∀ item ∈ order.items,
inv'.reserved item.productId ≥ item.quantity := by
...| DP Phase | Spec Agent Role |
|---|---|
| Orient | Query spec coverage, find unformalized requirements |
| Specify | Generate/refine Topos specs, create Lean formalizations |
| Decide | ADRs reference formal properties proved |
| Test | Tests annotated with @trace SPEC-XX.YY, Lean proofs validate |
| Implement | Implementation guided by proved properties |
| Review | Review checks: are all specs formalized? proofs complete? |
| Close | Evidence includes proof status |
# Check spec coverage
/dp:spec coverage --with-lean
# List unproved theorems
/dp:spec list --unproved
# Generate Lean formalization for a requirement
/dp:spec formalize REQ-1 --level full-proofs
# Verify Lean specs type-check
/dp:spec verify --lean
# Sync Topos ↔ Lean
/dp:spec sync --check-drift// New node types for formal verification
pub enum NodeType {
// ... existing types ...
/// Lean theorem statement
Theorem,
/// Completed proof
Proof,
/// Proof strategy that worked
ProofStrategy,
/// Lemma extracted during proving
Lemma,
/// Failed proof attempt (for learning)
FailedProof,
}
// Store successful proof strategies
pub struct ProofStrategyNode {
/// Theorem pattern (generalized)
pattern: String,
/// Tactic sequence that worked
tactics: Vec<String>,
/// Domain
domain: Domain,
/// Success rate
success_rate: f64,
/// Contexts where it worked
successful_contexts: Vec<String>,
}The memory system enables learning from proof attempts:
- Store successful strategies → Reuse for similar theorems
- Store failed attempts → Avoid repeating mistakes
- Extract lemmas → Build domain-specific lemma library
- Track patterns → Identify common proof structures
| Task | Description |
|---|---|
| LeanRepl implementation | Subprocess management, JSON-RPC |
| Project management | Lake integration, dependency handling |
| Basic commands | Execute, type-check, tactic mode |
| Trajectory events | Lean-specific event types |
| Tests | REPL lifecycle, error handling |
Deliverables:
LeanReplimplementingReplEnvironmentLeanProjectManagerfor project/dependency handling- Integration with rlm-core orchestrator
| Task | Description |
|---|---|
| Topos MCP client | Connect to topos MCP server |
| Link annotations | Parse @lean annotations in Topos |
| Reverse annotations | Parse @topos comments in Lean |
| Link index | Build and maintain bidirectional index |
| Tests | Parsing, linking, index updates |
Deliverables:
- Topos MCP integration in rlm-core
- Link index with persistence
| Task | Description |
|---|---|
| Drift detection | Compare Topos ↔ Lean semantically |
| Topos → Lean generation | Generate Lean from Topos changes |
| Lean → Topos generation | Update Topos from Lean changes |
| Sync CLI commands | sync, drift, formalize |
| Tests | Sync scenarios, conflict resolution |
Deliverables:
DualTrackSyncengine- CLI commands for sync operations
| Task | Description |
|---|---|
| Agent scaffolding | Spec agent configuration, lifecycle |
| Intake phase | NL parsing, intent extraction |
| Refine phase | Question generation, clarification |
| Formalize phase | Topos + Lean generation |
| Tests | End-to-end spec creation |
Deliverables:
- Spec agent implementation
- Integration with RLM orchestrator
| Task | Description |
|---|---|
| Decidable tier | Basic tactic automation |
| Automation tier | aesop, linarith, ring integration |
| AI-assisted tier | LLM proof generation |
| Strategy memory | Store/retrieve successful strategies |
| Tests | Proof benchmarks per domain |
Deliverables:
AIProofAssistantwith progressive automation- Strategy memory integration
| Task | Description |
|---|---|
| Spec coverage | Track formalization coverage |
| Proof status | Track proof completion |
| CLI commands | DP spec commands with Lean |
| Evidence gathering | Proof evidence in tasks |
| Tests | DP workflow integration |
Deliverables:
- Full DP integration
- Documentation
- [historical target] Lean REPL executes commands and tactics correctly
- [historical target] Lean projects with mathlib dependencies work
- [historical target] Topos ↔ Lean bidirectional linking works
- [historical target] Drift detection catches spec divergence
- [historical target] Spec agent generates valid Topos and Lean specs
- [historical target] Progressive proof automation achieves >70% auto-proof rate on simple theorems
- [historical target] DP integration tracks spec coverage and proof status
| Metric | Target |
|---|---|
| Lean REPL startup | < 5s (with cached build) |
| Simple tactic execution | < 500ms |
| Type checking | < 2s for typical module |
| AI proof attempt | < 30s per attempt |
| Drift detection | < 10s for 100 linked artifacts |
- [historical target] >80% test coverage on core components
- [historical target] All public APIs documented
- [historical target] Proof strategies stored for reuse
- [historical target] Graceful degradation when Lean unavailable
-
Lean version management: How to handle projects requiring different Lean versions?
- Proposal: Per-project Lean version via elan, similar to rustup
-
Mathlib dependency: Full mathlib is large (~2GB). Cache strategy?
- Proposal: Lazy download, shared cache across projects
-
Proof persistence: Store proofs in memory or external files?
- Proposal: Lean files are source of truth, memory stores metadata
-
Human review of AI proofs: How to surface AI-generated proofs for review?
- Proposal: Mark AI proofs with attribute, review during DP review phase
-
Proof maintenance: How to handle proofs that break when specs change?
- Proposal: Drift detection flags broken proofs, prioritize in tasks
-
Partial formalization: How to handle specs that can't be fully formalized?
- Proposal:
sorrywith confidence level, track as technical debt
- Proposal:
- leanprover-community/repl - Official Lean REPL
- LeanInteract - Python interface
- lean-spec - Program specification
- LeanDojo - AI-driven theorem proving
- Mathlib - Mathematics library
- AWS Cedar verification - Industrial Lean usage
- Martin Kleppmann on AI + FV
- DeepSeek-Prover-V2 - AI theorem proving
- Topos GitHub - Semantic contract language
- Architecture, language spec, and MCP integration docs
- Unified RLM Library Design - Core architecture
- ADR-001 - Architectural decisions