Lean Formal Verification System Design

Formal specification and verification capabilities for rlm-core

Historical design/planning artifact:

Checklist lines marked [historical target] are archival, not active implementation backlog.

For live execution status and priorities, use bd status and:

docs/execution-plan/STATUS.md

docs/execution-plan/TASK-REGISTRY.md

docs/execution-plan/WORKBOARD.md

Executive Summary

This document proposes two interconnected capabilities:

Lean REPL - A Lean 4 REPL integrated with rlm-core for interactive theorem proving, spec validation, and implementation verification
Spec Agent - An AI agent specialized in creating, refining, and formalizing specifications, with first-class Topos integration

Together, these enable a dual-track specification workflow:

Topos (semantic contracts) - Human-readable intent, traceability, evidence
Lean (formal specs) - Machine-verifiable types, invariants, proofs

1. Vision: Dual-Track Formal Specifications

1.1 The Problem

Current specification approaches have a gap:

Natural language specs are ambiguous, can't be verified
Formal specs are precise but hard to write and maintain
AI-generated code may not match intent without verification

1.2 The Solution

┌─────────────────────────────────────────────────────────────────────────────┐
│                         DUAL-TRACK SPECIFICATION                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Natural Language    Spec Agent      Topos Spec        Lean Formalization   │
│      Intent      ───────────────►    (.tps)     ◄────►     (.lean)          │
│                      (bidirectional)                                         │
│                                         │                    │               │
│                                         │                    │               │
│                                         ▼                    ▼               │
│                                   Implementation      Proof Checking         │
│                                     (Code)           (Type Checker)          │
│                                         │                    │               │
│                                         ▼                    ▼               │
│                                    Evidence ◄──────────► Verified            │
│                                  (git, tests)          Properties            │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Key insight: LLMs are well-suited for proof generation because invalid proofs are rejected by the type checker. The challenge shifts to writing good specifications, which is exactly what the Spec Agent addresses.

1.3 Traceability Chain

[REQ-1] User requirement (Topos)
    ↓
[BEH-1] Behavior contract (Topos)
    ↓ implements
[SPEC-01.01] Lean type/theorem reference
    ↓ formalizes
theorem create_order_preserves_inventory : ∀ order, valid_order order → ...
    ↓ proves
[TASK-1] Implementation task (Topos)
    ↓ evidence
PR #123, commit abc123, 95% coverage

2. Lean REPL Architecture

2.1 Integration Model

The Lean REPL follows the same subprocess + JSON-RPC pattern as the Python REPL:

┌─────────────┐  JSON-RPC   ┌─────────────────────────┐
│  rlm-core   │◄───────────►│  Lean 4 REPL            │
│  (Rust)     │  stdin/out  │  (leanprover-community) │
└─────────────┘             └─────────────────────────┘
                                      │
                                      ▼
                            ┌─────────────────────────┐
                            │  Mathlib / Custom Libs  │
                            │  (lake dependencies)    │
                            └─────────────────────────┘

Backend: leanprover-community/repl

JSON protocol over stdin/stdout
Environment pickling (save/restore proof states)
Both command mode and tactic mode

2.2 Rust Implementation

// Extension of existing ReplEnvironment trait
pub struct LeanRepl {
    process: Child,
    reader: BufReader<ChildStdout>,
    writer: BufWriter<ChildStdin>,
    env_counter: u64,  // Track Lean environments
    project_root: PathBuf,
}

impl ReplEnvironment for LeanRepl {
    fn execute(&mut self, code: &str) -> Result<ExecuteResult> {
        // Send JSON command to Lean REPL
        let request = LeanCommand::Command {
            cmd: code.to_string(),
            env: self.current_env,
        };
        self.send_json(&request)?;

        // Parse response
        let response: LeanResponse = self.read_json()?;

        Ok(ExecuteResult {
            success: response.sorries.is_empty() && response.messages.iter()
                .all(|m| m.severity != "error"),
            stdout: response.format_output(),
            stderr: response.format_errors(),
            metadata: Some(json!({
                "env": response.env,
                "sorries": response.sorries,
                "goals": response.goals,
            })),
        })
    }

    fn get_pending_operations(&self) -> Vec<String> {
        // Sorries and holes become pending operations
        self.sorries.iter()
            .map(|s| format!("sorry:{}", s.goal))
            .collect()
    }

    fn resolve_operation(&mut self, id: &str, result: Value) -> Result<()> {
        // Apply tactic or proof provided by AI
        if id.starts_with("sorry:") {
            let tactic = result.as_str().ok_or(Error::InvalidTactic)?;
            self.apply_tactic(tactic)?;
        }
        Ok(())
    }
}

2.3 Lean REPL Protocol

Commands (from leanprover-community/repl):

Command Type	Fields	Purpose
`command`	`cmd`, `env?`	Execute Lean command
`tactic`	`tactic`, `proofState`	Apply tactic in proof mode
`pickle`	`path`, `env`	Save environment to file
`unpickle`	`path`	Restore environment

Response Fields:

Field	Type	Description
`env`	`number`	Environment ID (for backtracking)
`messages`	`array`	Compiler messages (info, warning, error)
`sorries`	`array`	Unfinished goals with positions
`goals`	`array?`	Proof goals in tactic mode

2.4 Project Management

Lean requires project context (mathlib, dependencies). The REPL manager handles this:

pub struct LeanProjectManager {
    /// Active project configurations
    projects: HashMap<PathBuf, LeanProject>,
}

pub struct LeanProject {
    root: PathBuf,
    lakefile: LakefileConfig,
    lean_version: String,
    /// Built packages available for import
    packages: Vec<String>,
}

impl LeanProjectManager {
    /// Initialize or get existing project
    pub async fn get_or_create(&mut self, path: &Path) -> Result<&LeanProject> {
        if !self.projects.contains_key(path) {
            // Check for existing lakefile.lean/lakefile.toml
            let project = if path.join("lakefile.lean").exists() {
                LeanProject::from_existing(path)?
            } else {
                // Create new project with defaults
                LeanProject::create_new(path, LeanProjectTemplate::Mathlib)?
            };

            // Build dependencies
            project.lake_build().await?;

            self.projects.insert(path.to_path_buf(), project);
        }
        Ok(self.projects.get(path).unwrap())
    }
}

2.5 Proof State Management

/// Proof state that can be saved/restored
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ProofState {
    /// Environment ID from Lean REPL
    env: u64,
    /// Current proof goals
    goals: Vec<Goal>,
    /// Proof steps taken
    history: Vec<ProofStep>,
    /// Checkpoint file path (for persistence)
    checkpoint: Option<PathBuf>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Goal {
    /// Goal type signature
    target: String,
    /// Local hypotheses
    hyps: Vec<Hypothesis>,
    /// Suggested tactics (AI-generated)
    suggestions: Vec<TacticSuggestion>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ProofStep {
    tactic: String,
    /// Goals before
    pre_goals: Vec<String>,
    /// Goals after
    post_goals: Vec<String>,
    /// Time taken
    elapsed_ms: u64,
}

2.6 Trajectory Events for Lean

pub enum TrajectoryEventType {
    // ... existing events ...

    // Lean-specific events
    LeanExec,           // Lean command execution
    LeanResult,         // Lean execution result
    ProofStart,         // Begin proof attempt
    TacticApply,        // Tactic application
    GoalChange,         // Proof goals changed
    SorryFound,         // Unfinished proof detected
    ProofComplete,      // All goals discharged
    TypeCheck,          // Type checking result
    LemmaExtract,       // Lemma extracted from proof
}

3. Spec Agent Architecture

3.1 Agent Overview

The Spec Agent is a specialized agent for specification creation and refinement:

┌─────────────────────────────────────────────────────────────────────────────┐
│                            SPEC AGENT                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐                   │
│  │   Intake     │───►│   Refine     │───►│  Formalize   │                   │
│  │              │    │              │    │              │                   │
│  │ • NL parsing │    │ • Clarify    │    │ • Topos gen  │                   │
│  │ • Intent     │    │ • Decompose  │    │ • Lean gen   │                   │
│  │   extraction │    │ • Validate   │    │ • Sync       │                   │
│  └──────────────┘    └──────────────┘    └──────────────┘                   │
│          │                  │                   │                            │
│          └──────────────────┼───────────────────┘                            │
│                             ▼                                                │
│                    ┌──────────────┐                                          │
│                    │   Verify     │                                          │
│                    │              │                                          │
│                    │ • Type check │                                          │
│                    │ • Prove      │                                          │
│                    │ • Validate   │                                          │
│                    └──────────────┘                                          │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

3.2 Agent Capabilities

Capability	Description	Tools
Intake	Parse natural language requirements into structured intent	NLP, LLM
Refine	Iterative clarification with human feedback	AskUserQuestion, Memory
Formalize (Topos)	Generate Topos specs (.tps files)	Topos MCP tools
Formalize (Lean)	Generate Lean types, theorems, proofs	Lean REPL
Verify	Type-check Lean specs, attempt proofs	Lean REPL
Sync	Keep Topos ↔ Lean in sync	Custom sync engine

3.3 Agent Configuration

pub struct SpecAgentConfig {
    /// Default formalization level
    pub default_level: FormalizationLevel,
    /// Domains to prioritize (affects library imports)
    pub domains: Vec<Domain>,
    /// Proof strategy
    pub proof_strategy: ProofStrategy,
    /// Topos integration
    pub topos_enabled: bool,
    pub topos_project_root: Option<PathBuf>,
    /// Lean project settings
    pub lean_project_root: Option<PathBuf>,
    pub lean_libraries: Vec<String>,
}

#[derive(Debug, Clone, Copy)]
pub enum FormalizationLevel {
    /// Types only - domain models, enums, structs
    Types,
    /// Types + invariants - type-level constraints
    TypesWithInvariants,
    /// Pre/post conditions - function contracts
    Contracts,
    /// Full proofs - complete correctness proofs
    FullProofs,
}

#[derive(Debug, Clone, Copy)]
pub enum ProofStrategy {
    /// decide, simp, omega, then AI, then human
    Progressive,
    /// Always try AI proof search first
    AIFirst,
    /// Draft outline, human completes
    HumanInLoop,
}

#[derive(Debug, Clone)]
pub enum Domain {
    AlgorithmsDataStructures,
    DistributedSystems,
    APIsProtocols,
    SecurityProperties,
    ApplicationFlow,
}

3.4 Workflow: Intake → Refine → Formalize

┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 1: INTAKE                                                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  User: "I need an order management system that ensures inventory             │
│         is never oversold, supports partial fulfillment, and                 │
│         maintains audit trails for compliance."                              │
│                                                                              │
│                              ▼                                               │
│                                                                              │
│  Spec Agent extracts:                                                        │
│  • Domain: E-commerce / Inventory                                            │
│  • Key entities: Order, Inventory, AuditLog                                  │
│  • Invariants: inventory >= 0, no overselling                                │
│  • Behaviors: create_order, fulfill_order, partial_fulfill                   │
│  • Properties: audit_complete (all changes logged)                           │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 2: REFINE                                                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Spec Agent asks clarifying questions:                                       │
│                                                                              │
│  Q1: "How should partial fulfillment work?"                                  │
│      • Split into multiple shipments                                         │
│      • Backorder remaining items                                             │
│      • Cancel unfulfillable items                                            │
│                                                                              │
│  Q2: "What audit events need to be captured?"                                │
│      • All state changes                                                     │
│      • Only fulfillment events                                               │
│      • Configurable per order type                                           │
│                                                                              │
│  Q3: "Are there concurrency requirements?"                                   │
│      • Single-writer (queue-based)                                           │
│      • Optimistic locking                                                    │
│      • Serializable transactions                                             │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 3: FORMALIZE                                                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Generates Topos spec (human-readable):                                      │
│                                                                              │
│  ```topos                                                                    │
│  spec OrderManagement                                                        │
│                                                                              │
│  ## Concepts                                                                 │
│                                                                              │
│  Order:                                                                      │
│    id: `OrderId`                                                             │
│    items: list of `OrderItem`                                                │
│    status: `pending` | `partial` | `fulfilled` | `cancelled`                 │
│    invariant: items.all(i => i.quantity > 0)                                 │
│    @lean: Order.lean#Order                                                   │
│                                                                              │
│  ## Behaviors                                                                │
│                                                                              │
│  create_order:                                                               │
│    implements: REQ-1                                                         │
│    given: valid order request                                                │
│    returns: `Order` with status `pending`                                    │
│    ensures: inventory reserved for all items                                 │
│    @lean: Order.lean#create_order_spec                                       │
│  ```                                                                         │
│                                                                              │
│  AND generates Lean formalization (machine-verifiable):                      │
│                                                                              │
│  ```lean                                                                     │
│  -- Order.lean                                                               │
│  import Mathlib.Data.List.Basic                                              │
│                                                                              │
│  structure OrderItem where                                                   │
│    productId : Nat                                                           │
│    quantity : Nat                                                            │
│    quantity_pos : quantity > 0                                               │
│                                                                              │
│  structure Order where                                                       │
│    id : Nat                                                                  │
│    items : List OrderItem                                                    │
│    status : OrderStatus                                                      │
│                                                                              │
│  -- Theorem: creating an order reserves inventory                            │
│  theorem create_order_reserves_inventory                                     │
│    (inv : Inventory) (req : OrderRequest)                                    │
│    (h_valid : valid_request inv req) :                                       │
│    let (order, inv') := create_order inv req                                 │
│    ∀ item ∈ order.items,                                                     │
│      inv'.reserved item.productId ≥ item.quantity := by                      │
│    sorry  -- [?] AI will attempt proof                                       │
│  ```                                                                         │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

4. Topos ↔ Lean Dual-Track Sync

4.1 Linking Mechanism

Topos specs reference Lean artifacts via @lean annotations:

## Concepts

Order:
  id: `OrderId`
  items: list of `OrderItem`
  status: `pending` | `partial` | `fulfilled` | `cancelled`
  invariant: items.all(i => i.quantity > 0)
  @lean: specs/Order.lean#Order           # Links to Lean structure
  @lean.invariant: specs/Order.lean#Order.items_nonempty  # Links to theorem

Lean files include reverse links via comments:

/--
@topos: OrderManagement.tps#Order
Order represents a customer order with line items.
-/
structure Order where
  id : Nat
  items : List OrderItem
  status : OrderStatus
  items_pos : ∀ item ∈ items, item.quantity > 0  -- @topos: Order.invariant

4.2 Sync Engine

pub struct DualTrackSync {
    topos_project: PathBuf,
    lean_project: PathBuf,
    link_index: LinkIndex,
}

#[derive(Debug)]
pub struct LinkIndex {
    /// Topos artifact → Lean artifact
    topos_to_lean: HashMap<ToposRef, LeanRef>,
    /// Lean artifact → Topos artifact
    lean_to_topos: HashMap<LeanRef, ToposRef>,
}

impl DualTrackSync {
    /// Check for drift between Topos and Lean
    pub async fn check_drift(&self) -> Result<DriftReport> {
        let mut report = DriftReport::new();

        for (topos_ref, lean_ref) in &self.link_index.topos_to_lean {
            // Parse both artifacts
            let topos_artifact = self.parse_topos(topos_ref)?;
            let lean_artifact = self.parse_lean(lean_ref)?;

            // Compare semantically
            if let Some(drift) = self.compare_artifacts(&topos_artifact, &lean_artifact)? {
                report.add_drift(topos_ref.clone(), lean_ref.clone(), drift);
            }
        }

        // Check for unlinked artifacts
        report.unlinked_topos = self.find_unlinked_topos()?;
        report.unlinked_lean = self.find_unlinked_lean()?;

        Ok(report)
    }

    /// Propagate changes from Topos to Lean
    pub async fn sync_topos_to_lean(&self, changes: &[ToposChange]) -> Result<Vec<LeanChange>> {
        let mut lean_changes = Vec::new();

        for change in changes {
            match change {
                ToposChange::ConceptAdded(concept) => {
                    // Generate Lean structure
                    let lean_struct = self.generate_lean_structure(concept)?;
                    lean_changes.push(LeanChange::StructureAdded(lean_struct));
                }
                ToposChange::InvariantAdded(inv) => {
                    // Generate Lean theorem
                    let lean_theorem = self.generate_lean_theorem(inv)?;
                    lean_changes.push(LeanChange::TheoremAdded(lean_theorem));
                }
                ToposChange::BehaviorAdded(behavior) => {
                    // Generate Lean function spec
                    let lean_spec = self.generate_lean_function_spec(behavior)?;
                    lean_changes.push(LeanChange::FunctionSpecAdded(lean_spec));
                }
                // ... other change types
            }
        }

        Ok(lean_changes)
    }
}

4.3 Formalization Levels

The sync engine supports progressive formalization:

Level	Topos	Lean
Types	Concepts with fields	`structure` definitions
Types + Invariants	Concepts with invariants	Structures with proofs in fields
Contracts	Behaviors with pre/post	Function specs with `requires`/`ensures`
Full Proofs	All of above	Complete theorems with proofs

impl DualTrackSync {
    pub fn set_formalization_level(&mut self, level: FormalizationLevel) {
        self.level = level;
    }

    fn generate_lean_for_concept(&self, concept: &Concept) -> Result<String> {
        match self.level {
            FormalizationLevel::Types => {
                // Just structure definition
                self.generate_basic_structure(concept)
            }
            FormalizationLevel::TypesWithInvariants => {
                // Structure with proof fields
                self.generate_structure_with_proofs(concept)
            }
            FormalizationLevel::Contracts | FormalizationLevel::FullProofs => {
                // Full specification
                self.generate_full_spec(concept)
            }
        }
    }
}

5. Proof Automation Strategy

5.1 Progressive Automation

┌─────────────────────────────────────────────────────────────────────────────┐
│                    PROOF AUTOMATION PIPELINE                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  TIER 1: Decidable (instant)                                                 │
│  ─────────────────────────                                                   │
│  • decide - boolean decidability                                             │
│  • native_decide - native computation                                        │
│  • omega - linear arithmetic                                                 │
│  • simp - simplification with lemmas                                         │
│                                                                              │
│            │ if fails                                                        │
│            ▼                                                                 │
│                                                                              │
│  TIER 2: Automation (seconds)                                                │
│  ────────────────────────────                                                │
│  • aesop - extensible automation                                             │
│  • linarith - linear arithmetic reasoning                                    │
│  • ring - ring algebra                                                       │
│  • polyrith - polynomial arithmetic (with LLM hints)                         │
│                                                                              │
│            │ if fails                                                        │
│            ▼                                                                 │
│                                                                              │
│  TIER 3: AI-Assisted (seconds-minutes)                                       │
│  ─────────────────────────────────────                                       │
│  • LLM generates tactic sequence                                             │
│  • Validates against type checker                                            │
│  • Retries with different strategies                                         │
│  • Uses LeanDojo / DeepSeek-Prover patterns                                  │
│                                                                              │
│            │ if fails                                                        │
│            ▼                                                                 │
│                                                                              │
│  TIER 4: Human-in-Loop                                                       │
│  ─────────────────────                                                       │
│  • Mark with sorry + TODO                                                    │
│  • Create task in beads                                                      │
│  • Provide proof outline for human                                           │
│  • Store in memory for future reference                                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

5.2 AI Proof Generation

pub struct AIProofAssistant {
    llm_client: Arc<dyn LLMClient>,
    lean_repl: Arc<Mutex<LeanRepl>>,
    strategy_memory: Arc<MemoryStore>,
}

impl AIProofAssistant {
    /// Attempt to prove a theorem
    pub async fn prove(&self, theorem: &str, context: &ProofContext) -> Result<ProofResult> {
        // Step 1: Try decidable tactics
        if let Some(proof) = self.try_decidable(theorem).await? {
            return Ok(ProofResult::Proved(proof));
        }

        // Step 2: Try automation tactics
        if let Some(proof) = self.try_automation(theorem, context).await? {
            return Ok(ProofResult::Proved(proof));
        }

        // Step 3: AI-assisted proof search
        for attempt in 0..self.config.max_ai_attempts {
            let tactics = self.generate_tactics(theorem, context, attempt).await?;

            match self.validate_tactics(&tactics).await? {
                ValidationResult::Valid(proof) => {
                    // Store successful strategy
                    self.strategy_memory.store_strategy(theorem, &proof).await?;
                    return Ok(ProofResult::Proved(proof));
                }
                ValidationResult::Partial { progress, remaining_goals } => {
                    // Update context with progress
                    context.add_progress(progress);
                    context.set_goals(remaining_goals);
                }
                ValidationResult::Invalid(errors) => {
                    // Learn from failure
                    context.add_failed_attempt(&tactics, &errors);
                }
            }
        }

        // Step 4: Give up, create TODO
        Ok(ProofResult::Sorry {
            outline: self.generate_proof_outline(theorem, context).await?,
            task_id: self.create_proof_task(theorem, context).await?,
        })
    }

    async fn generate_tactics(&self, theorem: &str, context: &ProofContext, attempt: u32) -> Result<Vec<String>> {
        let prompt = format!(
            r#"You are a Lean 4 theorem prover. Generate tactics to prove:

```lean
{theorem}

Context:

Available lemmas: {lemmas}
Similar proved theorems: {similar}
Previous failed attempts: {failed}

Attempt {attempt}. Generate a sequence of tactics, one per line. If you need a helper lemma, use have to introduce it. "#, theorem = theorem, lemmas = context.available_lemmas.join(", "), similar = context.similar_proofs.join("\n"), failed = context.failed_attempts.join("\n"), attempt = attempt, );

    let response = self.llm_client.complete(&prompt).await?;
    Ok(self.parse_tactics(&response))
}

}


### 5.3 Domain-Specific Strategies

```rust
/// Domain-specific proof strategies
pub fn get_domain_strategies(domain: Domain) -> Vec<ProofStrategy> {
    match domain {
        Domain::AlgorithmsDataStructures => vec![
            ProofStrategy::Induction,          // Structural induction
            ProofStrategy::WellFoundedRecursion,
            ProofStrategy::CaseAnalysis,
        ],
        Domain::DistributedSystems => vec![
            ProofStrategy::Bisimulation,       // State machine equivalence
            ProofStrategy::Invariant,          // Inductive invariants
            ProofStrategy::Refinement,         // Refinement mappings
        ],
        Domain::APIsProtocols => vec![
            ProofStrategy::StateTransition,    // Pre/post conditions
            ProofStrategy::SessionTypes,       // Protocol compliance
            ProofStrategy::ResourceAccounting, // Linear types
        ],
        Domain::SecurityProperties => vec![
            ProofStrategy::InformationFlow,    // Non-interference
            ProofStrategy::AccessControl,      // Authorization
            ProofStrategy::Cryptographic,      // Game-based proofs
        ],
        Domain::ApplicationFlow => vec![
            ProofStrategy::Termination,        // Liveness
            ProofStrategy::Safety,             // Invariant preservation
            ProofStrategy::Fairness,           // Progress properties
        ],
    }
}

6. Integration with Disciplined Process

6.1 Spec-Linked Tasks

Every Lean theorem/lemma traces to a [SPEC-XX.YY] identifier:

## Requirements

REQ-1: Order Creation
  when: valid order request received
  the system shall: create order with reserved inventory
  @spec: SPEC-01.01
  @lean: specs/Order.lean#create_order_reserves_inventory

/--
@spec: SPEC-01.01
@topos: OrderManagement.tps#REQ-1
-/
theorem create_order_reserves_inventory
  (inv : Inventory) (req : OrderRequest)
  (h_valid : valid_request inv req) :
  let (order, inv') := create_order inv req
  ∀ item ∈ order.items,
    inv'.reserved item.productId ≥ item.quantity := by
  ...

6.2 DP Integration Points

DP Phase	Spec Agent Role
Orient	Query spec coverage, find unformalized requirements
Specify	Generate/refine Topos specs, create Lean formalizations
Decide	ADRs reference formal properties proved
Test	Tests annotated with `@trace SPEC-XX.YY`, Lean proofs validate
Implement	Implementation guided by proved properties
Review	Review checks: are all specs formalized? proofs complete?
Close	Evidence includes proof status

6.3 CLI Integration

# Check spec coverage
/dp:spec coverage --with-lean

# List unproved theorems
/dp:spec list --unproved

# Generate Lean formalization for a requirement
/dp:spec formalize REQ-1 --level full-proofs

# Verify Lean specs type-check
/dp:spec verify --lean

# Sync Topos ↔ Lean
/dp:spec sync --check-drift

7. Memory Integration

7.1 Storing Proofs and Strategies

// New node types for formal verification
pub enum NodeType {
    // ... existing types ...

    /// Lean theorem statement
    Theorem,
    /// Completed proof
    Proof,
    /// Proof strategy that worked
    ProofStrategy,
    /// Lemma extracted during proving
    Lemma,
    /// Failed proof attempt (for learning)
    FailedProof,
}

// Store successful proof strategies
pub struct ProofStrategyNode {
    /// Theorem pattern (generalized)
    pattern: String,
    /// Tactic sequence that worked
    tactics: Vec<String>,
    /// Domain
    domain: Domain,
    /// Success rate
    success_rate: f64,
    /// Contexts where it worked
    successful_contexts: Vec<String>,
}

7.2 Learning from Proofs

The memory system enables learning from proof attempts:

Store successful strategies → Reuse for similar theorems
Store failed attempts → Avoid repeating mistakes
Extract lemmas → Build domain-specific lemma library
Track patterns → Identify common proof structures

8. Implementation Phases

Phase 1: Lean REPL Foundation (2-3 weeks)

Task	Description
LeanRepl implementation	Subprocess management, JSON-RPC
Project management	Lake integration, dependency handling
Basic commands	Execute, type-check, tactic mode
Trajectory events	Lean-specific event types
Tests	REPL lifecycle, error handling

Deliverables:

LeanRepl implementing ReplEnvironment
LeanProjectManager for project/dependency handling
Integration with rlm-core orchestrator

Phase 2: Topos Integration (1-2 weeks)

Task	Description
Topos MCP client	Connect to topos MCP server
Link annotations	Parse `@lean` annotations in Topos
Reverse annotations	Parse `@topos` comments in Lean
Link index	Build and maintain bidirectional index
Tests	Parsing, linking, index updates

Deliverables:

Topos MCP integration in rlm-core
Link index with persistence

Phase 3: Dual-Track Sync (2 weeks)

Task	Description
Drift detection	Compare Topos ↔ Lean semantically
Topos → Lean generation	Generate Lean from Topos changes
Lean → Topos generation	Update Topos from Lean changes
Sync CLI commands	`sync`, `drift`, `formalize`
Tests	Sync scenarios, conflict resolution

Deliverables:

DualTrackSync engine
CLI commands for sync operations

Phase 4: Spec Agent (2-3 weeks)

Task	Description
Agent scaffolding	Spec agent configuration, lifecycle
Intake phase	NL parsing, intent extraction
Refine phase	Question generation, clarification
Formalize phase	Topos + Lean generation
Tests	End-to-end spec creation

Deliverables:

Spec agent implementation
Integration with RLM orchestrator

Phase 5: Proof Automation (2-3 weeks)

Task	Description
Decidable tier	Basic tactic automation
Automation tier	aesop, linarith, ring integration
AI-assisted tier	LLM proof generation
Strategy memory	Store/retrieve successful strategies
Tests	Proof benchmarks per domain

Deliverables:

AIProofAssistant with progressive automation
Strategy memory integration

Phase 6: DP Integration (1-2 weeks)

Task	Description
Spec coverage	Track formalization coverage
Proof status	Track proof completion
CLI commands	DP spec commands with Lean
Evidence gathering	Proof evidence in tasks
Tests	DP workflow integration

Deliverables:

Full DP integration
Documentation

9. Success Criteria

9.1 Functional Requirements

[historical target] Lean REPL executes commands and tactics correctly
[historical target] Lean projects with mathlib dependencies work
[historical target] Topos ↔ Lean bidirectional linking works
[historical target] Drift detection catches spec divergence
[historical target] Spec agent generates valid Topos and Lean specs
[historical target] Progressive proof automation achieves >70% auto-proof rate on simple theorems
[historical target] DP integration tracks spec coverage and proof status

9.2 Performance Requirements

Metric	Target
Lean REPL startup	< 5s (with cached build)
Simple tactic execution	< 500ms
Type checking	< 2s for typical module
AI proof attempt	< 30s per attempt
Drift detection	< 10s for 100 linked artifacts

9.3 Quality Requirements

[historical target] >80% test coverage on core components
[historical target] All public APIs documented
[historical target] Proof strategies stored for reuse
[historical target] Graceful degradation when Lean unavailable

10. Open Questions

10.1 Architecture

Lean version management: How to handle projects requiring different Lean versions?
- Proposal: Per-project Lean version via elan, similar to rustup
Mathlib dependency: Full mathlib is large (~2GB). Cache strategy?
- Proposal: Lazy download, shared cache across projects
Proof persistence: Store proofs in memory or external files?
- Proposal: Lean files are source of truth, memory stores metadata

10.2 Workflow

Human review of AI proofs: How to surface AI-generated proofs for review?
- Proposal: Mark AI proofs with attribute, review during DP review phase
Proof maintenance: How to handle proofs that break when specs change?
- Proposal: Drift detection flags broken proofs, prioritize in tasks
Partial formalization: How to handle specs that can't be fully formalized?
- Proposal: sorry with confidence level, track as technical debt

11. References

Lean Ecosystem

leanprover-community/repl - Official Lean REPL
LeanInteract - Python interface
lean-spec - Program specification
LeanDojo - AI-driven theorem proving
Mathlib - Mathematics library

Formal Verification

AWS Cedar verification - Industrial Lean usage
Martin Kleppmann on AI + FV
DeepSeek-Prover-V2 - AI theorem proving

Topos

Topos GitHub - Semantic contract language
Architecture, language spec, and MCP integration docs

rlm-core

Unified RLM Library Design - Core architecture
ADR-001 - Architectural decisions

FilesExpand file tree

lean-formal-verification-design.md

Latest commit

History

lean-formal-verification-design.md

File metadata and controls

Lean Formal Verification System Design

Executive Summary

1. Vision: Dual-Track Formal Specifications

1.1 The Problem

1.2 The Solution

1.3 Traceability Chain

2. Lean REPL Architecture

2.1 Integration Model

2.2 Rust Implementation

2.3 Lean REPL Protocol

2.4 Project Management

2.5 Proof State Management

2.6 Trajectory Events for Lean

3. Spec Agent Architecture

3.1 Agent Overview

3.2 Agent Capabilities

3.3 Agent Configuration

3.4 Workflow: Intake → Refine → Formalize

4. Topos ↔ Lean Dual-Track Sync

4.1 Linking Mechanism

4.2 Sync Engine

4.3 Formalization Levels

5. Proof Automation Strategy

5.1 Progressive Automation

5.2 AI Proof Generation

6. Integration with Disciplined Process

6.1 Spec-Linked Tasks

6.2 DP Integration Points

6.3 CLI Integration

7. Memory Integration

7.1 Storing Proofs and Strategies

7.2 Learning from Proofs

8. Implementation Phases

Phase 1: Lean REPL Foundation (2-3 weeks)

Phase 2: Topos Integration (1-2 weeks)

Phase 3: Dual-Track Sync (2 weeks)

Phase 4: Spec Agent (2-3 weeks)

Phase 5: Proof Automation (2-3 weeks)

Phase 6: DP Integration (1-2 weeks)

9. Success Criteria

9.1 Functional Requirements

9.2 Performance Requirements

9.3 Quality Requirements

10. Open Questions

10.1 Architecture

10.2 Workflow

11. References

Lean Ecosystem

Formal Verification

Topos

rlm-core