How Simplex & Plexus Work Together

Introduction

Most AI coding workflows follow a simple loop: write a natural language prompt, hand it to an LLM, get code back. That works for small tasks but breaks down when multiple agents need to build a system together.

LLMs have seen massive amounts of code during training, so they handle syntax well enough. Where they consistently fall short is specification: expressing intent, constraints, and edge cases clearly enough for autonomous agents to act without clarification.

Simplex and Plexus solve this by splitting the problem into two layers: Simplex handles behavioral specification while Plexus handles system orchestration. Together they turn development into a structured spec pipeline rather than an unstructured prompt workflow.

The Two Layers

Simplex: Behavior

Defines what individual units of functionality must do. Functions, rules, success criteria, examples. Unambiguous behavioral contracts that agents can implement autonomously.

Plexus: Orchestration

Defines how the system gets built. Task decomposition, agent assignment, dependency tracking, parallel execution, merge strategy. The project manager for agent-driven development.

Architecture
System goal
   ↓
Plexus task graph          # orchestration
   ↓
Component specs
   ↓
Simplex function specs     # behavior
   ↓
Generated code

Simplex: Behavioral Contracts

The Simplex specification is a lightweight format for describing work that autonomous coding agents should perform. It focuses on what needs to be done and how to know when it's done, without prescribing implementation details.

Core Principles

Enforced simplicity: break complex problems into smaller specs rather than writing monolithic descriptions
Semantic precision: wording must be unambiguous even if formatting varies
Testability: examples define correct behavior; agents evaluate their own success
Completeness: the spec must stand alone without clarification from the author
Implementation autonomy: agents choose algorithms, data structures, and tools

Simplex describes functions, modules, and small services, giving agents unambiguous behavioral targets. By itself, however, it doesn't say which components exist, how they interact, what order work should happen in, or which agent implements what.

Landmarks & Structure

Simplex intentionally has no strict grammar. Instead it relies on recognizable structural markers called landmarks, designed so LLMs can interpret them reliably:

Simplex
FUNCTION reconcile_transactions(a, b)

RULES
  Match transactions with identical IDs
  Flag mismatches exceeding threshold

DONE_WHEN
  "All matches returned"
  "Unmatched items flagged with reason"

EXAMPLES
  Input: [{id:1, amt:100}], [{id:1, amt:100}]
  Output: [{id:1, status: matched}]

ERRORS
  Empty input → return empty result, no error

The key landmarks (FUNCTION, RULES, DONE_WHEN, EXAMPLES, ERRORS) give agents enough structure to parse intent without requiring a formal parser. Additional markers like CONSTRAINTS, READS, WRITES, and DEPENDS_ON express file boundaries and ordering when used within Plexus orchestration.

The Specification Stack

Simplex is intentionally minimalist, focused on function-level behavior rather than system architecture. It answers what should this function do? and how do we know it's correct? Plexus handles the orchestration layer above it. Together they cover two of the three layers that autonomous agent workflows require:

Behavior spec

What individual functions must do. Simplex addresses this layer through enforced simplicity, testable criteria, and implementation autonomy.

Interface spec

How components talk to each other through OpenAPI schemas, data models, and protocol contracts. This is the layer we are building toward next.

Execution spec

How the work gets done: task graphs, agent pipelines, dependency ordering, and merge strategy. Plexus addresses this layer.

Simplex covers behavior, Plexus covers execution, and interface specification is on the roadmap. Closing that middle layer will complete the stack from behavioral contracts to working software.

Plexus: System Orchestration

Plexus is a native macOS application that fills the orchestration gap. It takes a Simplex spec, decomposes it into parallel tasks, assigns each task to an isolated agent, tracks dependencies, and merges the results. It acts as the build system, project manager, and agent orchestrator for AI-driven development.

Conceptually, Plexus does four things:

Breaks work into tasks

A system goal becomes component tasks, which become function tasks, which map to Simplex specs. Each function becomes exactly one agent task.

Assigns agents

Claude Code, Codex, Cline, or Gemini CLI. Use one for planning and another for execution, swap models per task, and escalate on failure.

Tracks dependencies

Smart inference from READS and WRITES declarations. Topological sorting into execution waves. Dependent tasks wait; independent tasks run in parallel.

Orchestrates execution

Agents generate code in isolated git worktrees, evaluate their own success criteria, and report back. Plexus handles merging, conflict resolution, and retry.

Task Decomposition

Plexus reads the dependency declarations in a Simplex spec and builds a directed acyclic graph. Functions with no dependencies form the first wave and launch simultaneously. After each wave merges, the next wave branches from the updated codebase.

Wave Scheduling
# Wave 1 (parallel) — no dependencies
  generate-tokens    → Agent 1
  setup-database     → Agent 2
  add-config-schema  → Agent 3

# Wave 2 (waits for wave 1 to merge)
  middleware         → Agent 1  # READS token output
  seed-data          → Agent 2  # READS database schema

# Wave 3
  protect-routes     → Agent 1  # READS middleware

This is called deferred branching. Wave 1 agents branch from HEAD. After wave 1 completes and merges, wave 2 agents branch from the orchestration branch tip. A wave 2 agent that depends on database setup sees the actual tables, migrations, and schema that the wave 1 agent created, rather than relying on descriptions in the spec.

Agent Coordination

Every agent runs in its own git worktree, a real isolated checkout of the repository. Your working directory is never modified during orchestration. Each agent receives the full spec for context but is instructed to implement only its assigned function.

After agents complete their tasks, Plexus merges each worktree back through a multi-stage merge pipeline with escalating strategies: standard merge, patience merge, and rebase + merge. If all stages fail, the agent is marked as merge-failed and can be retried from the current branch tip.

Optional nono sandboxing provides kernel-level isolation for Claude Code and Codex agents, restricting filesystem access to only the agent's worktree and the repository's git metadata.

Your checkout is never modified. All agent work happens in isolated worktrees. Changes only appear in your working directory after you explicitly merge the orchestration branch.

The Spec Pipeline

When used together, Simplex and Plexus transform development from prompt-and-pray into a structured pipeline:

1. Write a Simplex spec

Define the system as landmarks and functions. Each function gets rules, constraints, file boundaries, and DONE_WHEN success criteria.

2. Plexus decomposes the spec

Functions become agent tasks whose dependencies are inferred from READS and WRITES, then sorted into execution waves.

3. Agents implement in isolation

Each agent gets its own worktree, the full spec for context, and a single function to implement. Agents run tests, evaluate success criteria, and commit.

4. Plexus merges and reports

Passing agents merge automatically while failed agents can be retried with escalated models, and a report captures what happened across the entire orchestration.

The key shift is that the primary artifact of development becomes the specification graph rather than the code itself, because code is a derived output of well-structured specs.

Problems Solved

The pairing of Simplex and Plexus addresses three fundamental challenges with autonomous coding agents:

Ambiguity

Simplex reduces ambiguity at the behavior level. Landmarks, rules, examples, and success criteria constrain what agents can misinterpret, and the built-in linter catches vague or untestable criteria before orchestration begins.

Planning

Plexus provides task decomposition. It turns a system-level goal into a dependency graph of agent tasks, determines execution order, and handles wave-based scheduling so later tasks build on earlier results.

Coordination

Plexus manages multiple agents working on a shared codebase through git worktree isolation (preventing race conditions), topological merge ordering (preventing conflicts), and independent failure handling so that a stuck task does not block unrelated work.

Traditional Parallels

The Simplex + Plexus approach resembles established ideas from software engineering, adapted for AI-native workflows:

Traditional	AI-Native
Requirements document	Simplex specs
Architecture document	Plexus task graph
Project manager	Orchestration engine
Developers	Coding agents
Code review	Success criteria + merge pipeline
CI/CD pipeline	Wave-based execution + reports

The system becomes specifications → execution rather than specifications → humans → execution. The feedback loop tightens because agents evaluate their own success criteria, and failed work can be retried immediately with context from prior attempts.

How This Differs from Agent Frameworks

Most agent frameworks (CrewAI, LangGraph, AutoGen) focus on runtime workflows: query → research agent → summarizer agent → response. Plexus instead focuses on software construction workflows:

Comparison
# Typical agent pipeline
user query → research agent → summarizer → response

# Plexus construction pipeline
spec → plan → implement → test → integrate

The result is closer to automated software engineering than to the query-response loops that most agent frameworks provide.

Where We're Heading

Simplex and Plexus already cover behavior and execution. The roadmap is focused on completing the full specification stack:

Interface contracts between components via OpenAPI schemas, data models, and protocol definitions
Persistence layer specs covering schema evolution and migration validation
CI integration for automated validation of the entire spec graph on every push
Hierarchical Simplex specs that allow systems to decompose recursively from top-level goals down to individual functions
Judge for post-agent quality evaluation that catches the optimism problem in generated code

Autonomous agents are optimistic by default. They report success when tests pass, even when the code they wrote is brittle, over-engineered, or quietly wrong. Judge is a planned post-agent evaluation pass that reviews each agent's output against the original spec, checking for hallucinated functionality, silent failures, incomplete error handling, and specification drift. The goal is to close the gap between "the agent says it's done" and "the code is actually correct."

Hierarchical specs are another major capability on the roadmap. Instead of specifying individual functions, you describe an entire system and let Plexus decompose it through nested layers:

Hierarchical Specs
SYSTEM
  → COMPONENT
      → FUNCTION
          → Simplex spec

At that point the primary artifact of development becomes the specification graph: a hierarchical intent graph from which code is derived, and the specifications themselves become the source of truth.

The goal: describe what you want to build, and the stack handles the rest. Specifications in, software out.