Simplex defines what code must do while Plexus orchestrates who builds it and when, forming a spec-driven development stack for autonomous coding agents.
Introduction
Most AI coding workflows follow a simple loop: write a natural language prompt, hand it to an LLM, get code back. That works for small tasks but breaks down when multiple agents need to build a system together.
LLMs have seen massive amounts of code during training, so they handle syntax well enough. Where they consistently fall short is specification: expressing intent, constraints, and edge cases clearly enough for autonomous agents to act without clarification.
Simplex and Plexus solve this by splitting the problem into two layers: Simplex handles behavioral specification while Plexus handles system orchestration. Together they turn development into a structured spec pipeline rather than an unstructured prompt workflow.
The Two Layers
System goal ↓ Plexus task graph # orchestration ↓ Component specs ↓ Simplex function specs # behavior ↓ Generated code
Simplex: Behavioral Contracts
The Simplex specification is a lightweight format for describing work that autonomous coding agents should perform. It focuses on what needs to be done and how to know when it's done, without prescribing implementation details.
Core Principles
- Enforced simplicity: break complex problems into smaller specs rather than writing monolithic descriptions
- Semantic precision: wording must be unambiguous even if formatting varies
- Testability: examples define correct behavior; agents evaluate their own success
- Completeness: the spec must stand alone without clarification from the author
- Implementation autonomy: agents choose algorithms, data structures, and tools
Simplex describes functions, modules, and small services, giving agents unambiguous behavioral targets. By itself, however, it doesn't say which components exist, how they interact, what order work should happen in, or which agent implements what.
Landmarks & Structure
Simplex intentionally has no strict grammar. Instead it relies on recognizable structural markers called landmarks, designed so LLMs can interpret them reliably:
FUNCTION reconcile_transactions(a, b) RULES Match transactions with identical IDs Flag mismatches exceeding threshold DONE_WHEN "All matches returned" "Unmatched items flagged with reason" EXAMPLES Input: [{id:1, amt:100}], [{id:1, amt:100}] Output: [{id:1, status: matched}] ERRORS Empty input → return empty result, no error
The key landmarks (FUNCTION, RULES, DONE_WHEN, EXAMPLES, ERRORS) give agents enough structure to parse intent without requiring a formal parser. Additional markers like CONSTRAINTS, READS, WRITES, and DEPENDS_ON express file boundaries and ordering when used within Plexus orchestration.
The Specification Stack
Simplex is intentionally minimalist, focused on function-level behavior rather than system architecture. It answers what should this function do? and how do we know it's correct? Plexus handles the orchestration layer above it. Together they cover two of the three layers that autonomous agent workflows require:
Simplex covers behavior, Plexus covers execution, and interface specification is on the roadmap. Closing that middle layer will complete the stack from behavioral contracts to working software.
Plexus: System Orchestration
Plexus is a native macOS application that fills the orchestration gap. It takes a Simplex spec, decomposes it into parallel tasks, assigns each task to an isolated agent, tracks dependencies, and merges the results. It acts as the build system, project manager, and agent orchestrator for AI-driven development.
Conceptually, Plexus does four things:
READS and WRITES declarations. Topological sorting into execution waves. Dependent tasks wait; independent tasks run in parallel.Task Decomposition
Plexus reads the dependency declarations in a Simplex spec and builds a directed acyclic graph. Functions with no dependencies form the first wave and launch simultaneously. After each wave merges, the next wave branches from the updated codebase.
# Wave 1 (parallel) — no dependencies generate-tokens → Agent 1 setup-database → Agent 2 add-config-schema → Agent 3 # Wave 2 (waits for wave 1 to merge) middleware → Agent 1 # READS token output seed-data → Agent 2 # READS database schema # Wave 3 protect-routes → Agent 1 # READS middleware
This is called deferred branching. Wave 1 agents branch from HEAD. After wave 1 completes and merges, wave 2 agents branch from the orchestration branch tip. A wave 2 agent that depends on database setup sees the actual tables, migrations, and schema that the wave 1 agent created, rather than relying on descriptions in the spec.
Agent Coordination
Every agent runs in its own git worktree, a real isolated checkout of the repository. Your working directory is never modified during orchestration. Each agent receives the full spec for context but is instructed to implement only its assigned function.
After agents complete their tasks, Plexus merges each worktree back through a multi-stage merge pipeline with escalating strategies: standard merge, patience merge, and rebase + merge. If all stages fail, the agent is marked as merge-failed and can be retried from the current branch tip.
Optional nono sandboxing provides kernel-level isolation for Claude Code and Codex agents, restricting filesystem access to only the agent's worktree and the repository's git metadata.
The Spec Pipeline
When used together, Simplex and Plexus transform development from prompt-and-pray into a structured pipeline:
DONE_WHEN success criteria.READS and WRITES, then sorted into execution waves.The key shift is that the primary artifact of development becomes the specification graph rather than the code itself, because code is a derived output of well-structured specs.
Problems Solved
The pairing of Simplex and Plexus addresses three fundamental challenges with autonomous coding agents:
Traditional Parallels
The Simplex + Plexus approach resembles established ideas from software engineering, adapted for AI-native workflows:
| Traditional | AI-Native |
|---|---|
| Requirements document | Simplex specs |
| Architecture document | Plexus task graph |
| Project manager | Orchestration engine |
| Developers | Coding agents |
| Code review | Success criteria + merge pipeline |
| CI/CD pipeline | Wave-based execution + reports |
The system becomes specifications → execution rather than specifications → humans → execution. The feedback loop tightens because agents evaluate their own success criteria, and failed work can be retried immediately with context from prior attempts.
How This Differs from Agent Frameworks
Most agent frameworks (CrewAI, LangGraph, AutoGen) focus on runtime workflows: query → research agent → summarizer agent → response. Plexus instead focuses on software construction workflows:
# Typical agent pipeline user query → research agent → summarizer → response # Plexus construction pipeline spec → plan → implement → test → integrate
The result is closer to automated software engineering than to the query-response loops that most agent frameworks provide.
Where We're Heading
Simplex and Plexus already cover behavior and execution. The roadmap is focused on completing the full specification stack:
- Interface contracts between components via OpenAPI schemas, data models, and protocol definitions
- Persistence layer specs covering schema evolution and migration validation
- CI integration for automated validation of the entire spec graph on every push
- Hierarchical Simplex specs that allow systems to decompose recursively from top-level goals down to individual functions
- Judge for post-agent quality evaluation that catches the optimism problem in generated code
Autonomous agents are optimistic by default. They report success when tests pass, even when the code they wrote is brittle, over-engineered, or quietly wrong. Judge is a planned post-agent evaluation pass that reviews each agent's output against the original spec, checking for hallucinated functionality, silent failures, incomplete error handling, and specification drift. The goal is to close the gap between "the agent says it's done" and "the code is actually correct."
Hierarchical specs are another major capability on the roadmap. Instead of specifying individual functions, you describe an entire system and let Plexus decompose it through nested layers:
SYSTEM → COMPONENT → FUNCTION → Simplex spec
At that point the primary artifact of development becomes the specification graph: a hierarchical intent graph from which code is derived, and the specifications themselves become the source of truth.