Claude Code Multi-Agent Architecture Explained: Swarm, Coordinator, Tmux, Worktree, and Remote Agents

Claude Code elevates sub-agents into a durable team runtime through Swarm, uses a Coordinator for centralized scheduling, and relies on four mechanisms—In-Process, Tmux, Worktree, and Remote—to solve parallel collaboration, isolation, and recovery. Keywords: Swarm, Coordinator, Worktree.

Parameter Description
Language TypeScript / Node.js
Collaboration Protocols mailbox file messaging, tmux pane control, Bridge protocol
GitHub Stars Not provided in the original article
Core Dependencies AsyncLocalStorage, tmux, Git worktree, AbortController

This article explains how Claude Code turns a single agent into a team collaboration system

The core conclusion of the original article is straightforward: Swarm is not just about spawning several sub-agents. It maintains a long-lived, recoverable, and schedulable collaborative runtime. The real problem it solves is not whether concurrency is possible, but how to make concurrency stable.

Unlike a typical AgentTool workflow that dispatches one-off tasks, Swarm introduces a Team File, member activity state, mailbox-based messaging, and backend routing. This allows the Leader to continuously manage multiple teammates and decide whether to reuse, stop, or add members based on task state.

Swarm team architecture diagram AI Visual Insight: The image shows a team-oriented orchestration model that expands from the primary agent to multiple teammates. It highlights how the Leader assigns tasks and aggregates results while several agents run in parallel within isolated contexts. This marks a key architectural shift from single-point execution to a team runtime.

Swarm is fundamentally about team lifecycle management

The Swarm lifecycle includes creating a team, registering the Leader, creating teammates, executing tasks, reporting idle status, and eventually deleting the team. The most important concept here is not spawning itself, but the closed loop of state: who is busy, who is idle, and who can be reused must all be explicitly recorded by the system.

interface TeamMember {
  agentId: string
  name: string
  isActive: boolean // Indicates whether the teammate is still executing a task
  backendType: 'in-process' | 'tmux'
}

interface TeamFile {
  teamName: string
  createdAt: number
  members: TeamMember[] // Persistent registry of team members
}

This structure defines the minimal persistence model for Swarm: a team is not a temporary session, but a recoverable collaboration entity.

The Team File gives team state recovery and observability

The Team File is not a standard configuration file. It is a persistent registry. Even after the Leader restarts, it can still use this file to recover the member list, backend type, and activity state. For engineering systems, this means multi-agent collaboration gains true resume-and-manage capability for the first time.

The isActive field is especially important. It allows the scheduler to make decisions directly from recorded state instead of inferring who is available. This design reduces a complex collaboration problem into a simple state machine.

The mailbox mechanism solves wake-up for cross-process teammates

A Tmux teammate runs in a separate process, so the Leader cannot push tasks into it like a normal function call. To solve this, the system introduces a mailbox file and an inbox poller. The Leader writes a message, the teammate polls and reads it, and then injects the message into its own user message queue.

async function dispatchToTmux(mailbox: string, message: string) {
  await fs.promises.writeFile(mailbox, message, 'utf-8') // Write the task into the mailbox
}

async function pollInbox(mailbox: string) {
  const msg = await fs.promises.readFile(mailbox, 'utf-8') // Poll and read the message
  if (msg) {
    return [{ role: 'user', content: msg }] // Inject into the initial message list
  }
  return []
}

This logic shows that cross-process collaboration depends on explicit message passing rather than shared memory.

In-Process and Tmux represent two very different isolation cost models

An In-Process teammate runs in the same Node.js process as the primary agent and relies on AsyncLocalStorage for context isolation. Its advantages are fast startup, no serialization overhead, and shared UI. Its drawbacks are shared resources, coupled lifecycle behavior, and the inability to keep spawning nested teammates indefinitely.

A Tmux teammate sits at the other end of the spectrum: fully isolated process, separate V8 instance, and independent command-line environment. The tradeoff is slower startup, disk-backed messaging, and no shared cache, but the fault domain is much clearer, which makes it a better fit for heavy or long-running tasks.

In-Process vs. Tmux comparison AI Visual Insight: The image contrasts same-process teammates with teammates running in independent tmux processes. It highlights the difference between shared-memory execution and file-based message passing, making the engineering tradeoff between low-overhead concurrency and high-isolation concurrency easy to see.

The in-process backend optimizes for minimal scheduling overhead

export class InProcessBackend {
  readonly type = 'in-process' as const

  async spawn(config: TeammateSpawnConfig) {
    const controller = new AbortController() // Create an independent cancellation signal for the teammate
    const taskId = registerTask(config.agentId, controller)
    startInProcessTeammate(config, controller) // Start the same-process agent loop
    return { agentId: config.agentId, taskId, abortController: controller }
  }
}

The key point in this code is independent cancellation plus same-process startup. It turns a teammate into a lightweight task unit.

The independent-process backend optimizes for strong isolation and fault boundaries

cd /repo && claude-code \
  --agent-id researcher@team \
  --agent-name researcher \
  --team-name my-team \
  --parent-session-id SESSION_ID

This command shows the essence of the tmux path: each teammate is a full CLI process with its own execution environment.

The Coordinator pattern improves multi-agent control through a single convergence point

Claude Code does not use a fully peer-to-peer, democratic agent network. Instead, it chooses centralized scheduling through a Coordinator. The reason is not theoretical elegance, but engineering reliability. Once workers are allowed to talk freely among themselves, the system can quickly fall into looping discussions, conflicting outputs, and difficult-to-debug concurrent states.

The Coordinator model treats all worker results as internal signals rather than direct participants in the conversation. As a result, the user sees only one output channel, decision-making stays with a single scheduler, and system behavior becomes more predictable.

Coordinator commander pattern diagram AI Visual Insight: The image depicts a star topology with the Coordinator at the center and multiple workers reporting inward. It emphasizes that user-facing output converges through a single channel, preventing worker-to-worker argument loops or concurrent write conflicts. This is a classic centralized multi-agent orchestration pattern.

The Coordinator’s value comes from global decision-making, not stronger tools

The Coordinator only needs three key tools: AgentTool to create workers, SendMessage to issue follow-up instructions, and TaskStop to terminate tasks. The tool surface is small, but when combined with global context, it is enough to handle parallel decomposition, conflict avoidance, and result aggregation.

async function coordinate() {
  const a = AgentTool('researcher') // Create a research worker
  const b = AgentTool('implementer') // Create an implementation worker
  await Promise.all([a, b]) // Start parallel tasks at the same time
  await SendMessage('implementer', 'Modify the code based on the research results') // Secondary scheduling step
}

This pseudocode shows that the Coordinator’s superpower is not a new interface, but the right to orchestrate concurrency.

Worktree and Remote extend isolation from processes to file systems and remote environments

When a task carries a risk of file conflicts, isolation: "worktree" provides Git worktree-level isolation. Each agent modifies code in its own directory view, which avoids directly contaminating the main workspace. Even more important is the cleanup policy: automatically remove the worktree when there are no changes, and preserve it when changes exist.

async function cleanupWorktree(changed: boolean) {
  if (!changed) {
    await removeWorktree() // Automatically clean up the temporary worktree when nothing changed
    return
  }
  return keepWorktree() // Preserve results when changes exist to avoid losing artifacts
}

This strategy reflects strong engineering pragmatism: save resources by default without sacrificing traceability.

Remote Agents extend execution into bridged environments

isolation: "remote" shows that Claude Code has already reserved a path for remote agent execution. It relies on the Bridge protocol for session creation, message delivery, and file synchronization. In essence, it extends local multi-agent orchestration into remote execution environments and provides infrastructure for high-privilege, heterogeneous compute, or controlled sandbox scenarios.

FAQ

What is the fundamental difference between Swarm and ordinary sub-agents?

Swarm is a persistent team runtime that includes member registration, state tracking, message passing, and team deletion. Ordinary sub-agents behave more like one-off task invocations and lack long-term scheduling and recovery capabilities.

Why is a Coordinator more reliable than letting multiple agents discuss freely?

Because the Coordinator provides a single decision point and a single output channel. It can centrally control concurrency, eliminate conflicts, and avoid endless worker discussions or unclear responsibility boundaries.

When should you choose In-Process, and when should you choose Tmux?

Choose In-Process for short tasks, low latency, and interaction-heavy workflows. Choose Tmux for long-running tasks, strong isolation, and independent crash recovery. If you also need file-level isolation, add worktree on top.

Core Summary: This article reconstructs Claude Code’s multi-agent collaboration design, focusing on the Swarm team runtime, the Coordinator control pattern, the In-Process and Tmux teammate backends, and the worktree/remote isolation strategy to help developers quickly understand its source-level orchestration model.