How Simplex & Plexus Work Together

Simplex defines what code must do while Plexus orchestrates who builds it and when, forming a spec-driven development stack for autonomous coding agents.

Introduction

Most AI coding workflows follow a simple loop: write a natural language prompt, hand it to an LLM, get code back. That works for small tasks but breaks down when multiple agents need to build a system together.

LLMs have seen massive amounts of code during training, so they handle syntax well enough. Where they consistently fall short is specification: expressing intent, constraints, and edge cases clearly enough for autonomous agents to act without clarification.

Simplex and Plexus solve this by splitting the problem into two layers: Simplex handles behavioral specification while Plexus handles system orchestration. Together they turn development into a structured spec pipeline rather than an unstructured prompt workflow.

The Two Layers

Simplex: Behavior
Defines what individual units of functionality must do. Functions, rules, success criteria, examples. Unambiguous behavioral contracts that agents can implement autonomously.
Plexus: Orchestration
Defines how the system gets built. Task decomposition, agent assignment, dependency tracking, parallel execution, merge strategy. The project manager for agent-driven development.
Architecture
System goal
   
Plexus task graph          # orchestration
   
Component specs
   
Simplex function specs     # behavior
   
Generated code

Simplex: Behavioral Contracts

The Simplex specification is a lightweight format for describing work that autonomous coding agents should perform. It focuses on what needs to be done and how to know when it's done, without prescribing implementation details.

Core Principles

Simplex describes functions, modules, and small services, giving agents unambiguous behavioral targets. By itself, however, it doesn't say which components exist, how they interact, what order work should happen in, or which agent implements what.

Landmarks & Structure

Simplex intentionally has no strict grammar. Instead it relies on recognizable structural markers called landmarks, designed so LLMs can interpret them reliably:

Simplex
FUNCTION reconcile_transactions(a, b)

RULES
  Match transactions with identical IDs
  Flag mismatches exceeding threshold

DONE_WHEN
  "All matches returned"
  "Unmatched items flagged with reason"

EXAMPLES
  Input: [{id:1, amt:100}], [{id:1, amt:100}]
  Output: [{id:1, status: matched}]

ERRORS
  Empty input → return empty result, no error

The key landmarks (FUNCTION, RULES, DONE_WHEN, EXAMPLES, ERRORS) give agents enough structure to parse intent without requiring a formal parser. Additional markers like CONSTRAINTS, READS, WRITES, and DEPENDS_ON express file boundaries and ordering when used within Plexus orchestration.

The Specification Stack

Simplex is intentionally minimalist, focused on function-level behavior rather than system architecture. It answers what should this function do? and how do we know it's correct? Plexus handles the orchestration layer above it. Together they cover two of the three layers that autonomous agent workflows require:

Behavior spec
What individual functions must do. Simplex addresses this layer through enforced simplicity, testable criteria, and implementation autonomy.
Interface spec
How components talk to each other through OpenAPI schemas, data models, and protocol contracts. This is the layer we are building toward next.
Execution spec
How the work gets done: task graphs, agent pipelines, dependency ordering, and merge strategy. Plexus addresses this layer.

Simplex covers behavior, Plexus covers execution, and interface specification is on the roadmap. Closing that middle layer will complete the stack from behavioral contracts to working software.


Plexus: System Orchestration

Plexus is a native macOS application that fills the orchestration gap. It takes a Simplex spec, decomposes it into parallel tasks, assigns each task to an isolated agent, tracks dependencies, and merges the results. It acts as the build system, project manager, and agent orchestrator for AI-driven development.

Conceptually, Plexus does four things:

Breaks work into tasks
A system goal becomes component tasks, which become function tasks, which map to Simplex specs. Each function becomes exactly one agent task.
Assigns agents
Claude Code, Codex, Cline, or Gemini CLI. Use one for planning and another for execution, swap models per task, and escalate on failure.
Tracks dependencies
Smart inference from READS and WRITES declarations. Topological sorting into execution waves. Dependent tasks wait; independent tasks run in parallel.
Orchestrates execution
Agents generate code in isolated git worktrees, evaluate their own success criteria, and report back. Plexus handles merging, conflict resolution, and retry.

Task Decomposition

Plexus reads the dependency declarations in a Simplex spec and builds a directed acyclic graph. Functions with no dependencies form the first wave and launch simultaneously. After each wave merges, the next wave branches from the updated codebase.

Wave Scheduling
# Wave 1 (parallel) — no dependencies
  generate-tokens    → Agent 1
  setup-database     → Agent 2
  add-config-schema  → Agent 3

# Wave 2 (waits for wave 1 to merge)
  middleware         → Agent 1  # READS token output
  seed-data          → Agent 2  # READS database schema

# Wave 3
  protect-routes     → Agent 1  # READS middleware

This is called deferred branching. Wave 1 agents branch from HEAD. After wave 1 completes and merges, wave 2 agents branch from the orchestration branch tip. A wave 2 agent that depends on database setup sees the actual tables, migrations, and schema that the wave 1 agent created, rather than relying on descriptions in the spec.

Agent Coordination

Every agent runs in its own git worktree, a real isolated checkout of the repository. Your working directory is never modified during orchestration. Each agent receives the full spec for context but is instructed to implement only its assigned function.

After agents complete their tasks, Plexus merges each worktree back through a multi-stage merge pipeline with escalating strategies: standard merge, patience merge, and rebase + merge. If all stages fail, the agent is marked as merge-failed and can be retried from the current branch tip.

Optional nono sandboxing provides kernel-level isolation for Claude Code and Codex agents, restricting filesystem access to only the agent's worktree and the repository's git metadata.

Your checkout is never modified. All agent work happens in isolated worktrees. Changes only appear in your working directory after you explicitly merge the orchestration branch.

The Spec Pipeline

When used together, Simplex and Plexus transform development from prompt-and-pray into a structured pipeline:

1. Write a Simplex spec
Define the system as landmarks and functions. Each function gets rules, constraints, file boundaries, and DONE_WHEN success criteria.
2. Plexus decomposes the spec
Functions become agent tasks whose dependencies are inferred from READS and WRITES, then sorted into execution waves.
3. Agents implement in isolation
Each agent gets its own worktree, the full spec for context, and a single function to implement. Agents run tests, evaluate success criteria, and commit.
4. Plexus merges and reports
Passing agents merge automatically while failed agents can be retried with escalated models, and a report captures what happened across the entire orchestration.

The key shift is that the primary artifact of development becomes the specification graph rather than the code itself, because code is a derived output of well-structured specs.

Problems Solved

The pairing of Simplex and Plexus addresses three fundamental challenges with autonomous coding agents:

Ambiguity
Simplex reduces ambiguity at the behavior level. Landmarks, rules, examples, and success criteria constrain what agents can misinterpret, and the built-in linter catches vague or untestable criteria before orchestration begins.
Planning
Plexus provides task decomposition. It turns a system-level goal into a dependency graph of agent tasks, determines execution order, and handles wave-based scheduling so later tasks build on earlier results.
Coordination
Plexus manages multiple agents working on a shared codebase through git worktree isolation (preventing race conditions), topological merge ordering (preventing conflicts), and independent failure handling so that a stuck task does not block unrelated work.

Traditional Parallels

The Simplex + Plexus approach resembles established ideas from software engineering, adapted for AI-native workflows:

TraditionalAI-Native
Requirements documentSimplex specs
Architecture documentPlexus task graph
Project managerOrchestration engine
DevelopersCoding agents
Code reviewSuccess criteria + merge pipeline
CI/CD pipelineWave-based execution + reports

The system becomes specifications → execution rather than specifications → humans → execution. The feedback loop tightens because agents evaluate their own success criteria, and failed work can be retried immediately with context from prior attempts.

How This Differs from Agent Frameworks

Most agent frameworks (CrewAI, LangGraph, AutoGen) focus on runtime workflows: query → research agent → summarizer agent → response. Plexus instead focuses on software construction workflows:

Comparison
# Typical agent pipeline
user query  research agent  summarizer  response

# Plexus construction pipeline
spec  plan  implement  test  integrate

The result is closer to automated software engineering than to the query-response loops that most agent frameworks provide.

Where We're Heading

Simplex and Plexus already cover behavior and execution. The roadmap is focused on completing the full specification stack:

Autonomous agents are optimistic by default. They report success when tests pass, even when the code they wrote is brittle, over-engineered, or quietly wrong. Judge is a planned post-agent evaluation pass that reviews each agent's output against the original spec, checking for hallucinated functionality, silent failures, incomplete error handling, and specification drift. The goal is to close the gap between "the agent says it's done" and "the code is actually correct."

Hierarchical specs are another major capability on the roadmap. Instead of specifying individual functions, you describe an entire system and let Plexus decompose it through nested layers:

Hierarchical Specs
SYSTEM
   COMPONENT
       FUNCTION
           Simplex spec

At that point the primary artifact of development becomes the specification graph: a hierarchical intent graph from which code is derived, and the specifications themselves become the source of truth.

The goal: describe what you want to build, and the stack handles the rest. Specifications in, software out.