| title | ProtEngine Labs |
|---|---|
| emoji | 𧬠|
| colorFrom | indigo |
| colorTo | blue |
| sdk | docker |
| app_port | 7860 |
Precision Medicine & ProtEngine Labs Pipeline A 22-agent AI orchestrator for novel lead discovery, synthesis planning, and clinical validation.
ProtEngine Labs is an enterprise-grade drug discovery pipeline designed to identify, optimize, and validate therapeutic candidates for specific protein mutations (e.g., EGFR T790M).
The system leverages a hierarchical 22-agent architecture to move from mutation parsing to synthesis-ready lead compounds in as little as 90 seconds (standard) to 6 hours (with full Molecular Dynamics validation).
graph TD
%% Global Styling
classDef acquisition fill:#eff6ff,stroke:#1d4ed8,stroke-width:2px,color:#1e3a8a
classDef analysis fill:#fefce8,stroke:#a16207,stroke-width:2px,color:#713f12
classDef design fill:#fff7ed,stroke:#c2410c,stroke-width:2px,color:#7c2d12
classDef validation fill:#fef2f2,stroke:#b91c1c,stroke-width:2px,color:#7f1d1d
classDef context fill:#f5f3ff,stroke:#6d28d9,stroke-width:2px,color:#4c1d95
classDef output fill:#f0fdf4,stroke:#15803d,stroke-width:2px,color:#14532d
classDef start_end fill:#f8fafc,stroke:#334155,stroke-width:2px,color:#0f172a
Start([Mutation Query]):::start_end --> Data[Stage 1: Multi-Source Data Acquisition]:::acquisition
Data --> Struct[Stage 2: Structural & Variant Analysis]:::analysis
Struct --> Design[Stage 3: Generative Molecule Design]:::design
Design --> Docking[Stage 4: Docking & Selectivity Filtering]:::design
Docking --> Validation[Stage 5: GNN & Molecular Dynamics Validation]:::validation
Validation --> Synthesis[Stage 6: ASKCOS Retrosynthesis Planning]:::output
Synthesis --> Report([Final Enterprise Drug Report]):::start_end
- Affinity Scoring: Multi-method validation using Vina, Gnina CNN, DimeNet++ GNN, and MM-GBSA.
- Confidence Tiers: Grounded in pLDDT and ESM-1v scores (WELL_KNOWN, PARTIAL, NOVEL).
- Stability Labels: Ranked by RMSD trajectories from 50ns MD simulations.
- Selectivity: Dual-docking against 10+ off-target proteins to ensure therapeutic windows.
The pipeline employs a sophisticated filtration funnel to optimize computational resources while maintaining maximum empirical rigor:
- Scaffold Hopping: Generates ~150 candidates using RDKit-driven bioisostere replacements and 3D diffusion.
- GNN Pre-Selection: Utilizes a DimeNet++ GNN to filter the top 30 leads down to exactly two high-confidence finalists.
- Molecular Dynamics: Performs 50ns OpenMM simulations on finalists only, calculating precise MM-GBSA binding free energies (ΞG) and RMSD stability.
ProtEngine Labs dual-docks top leads against a panel of 10+ off-target proteins. This computes a Selectivity Ratio (Target Affinity / Off-target Affinity), identifying potential side-effects and ensuring a wide therapeutic window early in the discovery phase.
Every discovered lead is validated through ASKCOS retrosynthesis planning. Candidates are scored by Synthetic Accessibility (SA) and mapped to specific reagents and steps, ensuring that the top leads are practical for experimental synthesis.
The pipeline is organized into 10 specialized stages, each managed by autonomous agents.
| Stage | Mission | Agents |
|---|---|---|
| 1 | Data Acquisition | MutationParser, Planner, FetchAgents |
| 2-3 | Structure & Variant | StructurePrep (ESMFold), VariantEffect (ESM-1v), PocketDetection |
| 4-5 | Design & Docking | MoleculeGen (Pocket2Mol), Docking (Gnina/Vina), Selectivity, ADMET |
| 6-7 | Ranking & Validation | GNNAffinity (DimeNet++), MDValidation (OpenMM), Resistance Forecasting |
| 8-9 | Context Analysis | SimilaritySearch, SynergyAgent, ClinicalTrialAgent |
| 10 | Output Generation | SynthesisAgent (ASKCOS), ReportAgent |
The project includes a comprehensive automation script that handles system-level dependencies including Miniconda, Python 3.11, AutoDock Vina, and Node.js.
./start.shThis script automatically detects missing dependencies, installs Miniconda if needed, creates the environment, and launches both frontend and backend.
- OS: Linux (preferred) or macOS.
- Conda: Miniconda3 recommended.
- Node.js: v20 or later.
- Bio-Tools: AutoDock Vina, fpocket, Open Babel (automated via setup script).
protengine/
βββ backend/ # FastAPI + LangGraph Orchestrator
β βββ agents/ # 22 Pipeline Agents
β βββ pipeline/ # LangGraph state machine
β βββ tests/ # Stress & Validation suites
β βββ data/ # Cache & Structure storage
βββ frontend/ # Next.js 16 + Tailwind v4 + GSAP
β βββ app/ # Feature-first routing
β βββ components/ # Analysis, 3D Mol, & GNN Visuals
β βββ lib/ # SSE streaming & API hooks
βββ tools/ # (Auto-generated) Local Miniconda & Binaries
βββ docs/ # Technical specifications & Architecture
| Variable | Required | Purpose |
|---|---|---|
OPENAI_API_KEY |
Yes | Primary reasoning (GPT-4o) |
GROQ_API_KEY |
Recommended | Fast Llama 3.3 orchestration |
DATABASE_URL |
Optional | Persistent discovery library (Neon/Postgres) |
- Observability: Fully integrated with LangSmith for real-time agent tracing and audit logs.
- Predictive Reliability: All scores include uncertainty ranges (e.g.,
-9.1 Β± 1.2 kcal/mol). - Safety Protocols: No clinical claims. Predictive outputs only. Mandatory disclaimers on all reports.
- Scalability: Async SSE streaming for real-time pipeline progress updates.
Apache License 2.0 β Computational Predictions Only. Experimental synthesis and binding validation required before biological testing.
Notice: Not for clinical use.