Claude Code elevates sub-agents into a durable team runtime through Swarm, uses a Coordinator for centralized scheduling, and relies on four mechanisms—In-Process, Tmux, Worktree, and Remote—to solve parallel collaboration, isolation, and recovery. Keywords: Swarm, Coordinator, Worktree.
| Parameter | Description |
|---|---|
| Language | TypeScript / Node.js |
| Collaboration Protocols | mailbox file messaging, tmux pane control, Bridge protocol |
| GitHub Stars | Not provided in the original article |
| Core Dependencies | AsyncLocalStorage, tmux, Git worktree, AbortController |
This article explains how Claude Code turns a single agent into a team collaboration system
The core conclusion of the original article is straightforward: Swarm is not just about spawning several sub-agents. It maintains a long-lived, recoverable, and schedulable collaborative runtime. The real problem it solves is not whether concurrency is possible, but how to make concurrency stable.
Unlike a typical AgentTool workflow that dispatches one-off tasks, Swarm introduces a Team File, member activity state, mailbox-based messaging, and backend routing. This allows the Leader to continuously manage multiple teammates and decide whether to reuse, stop, or add members based on task state.
AI Visual Insight: The image shows a team-oriented orchestration model that expands from the primary agent to multiple teammates. It highlights how the Leader assigns tasks and aggregates results while several agents run in parallel within isolated contexts. This marks a key architectural shift from single-point execution to a team runtime.
Swarm is fundamentally about team lifecycle management
The Swarm lifecycle includes creating a team, registering the Leader, creating teammates, executing tasks, reporting idle status, and eventually deleting the team. The most important concept here is not spawning itself, but the closed loop of state: who is busy, who is idle, and who can be reused must all be explicitly recorded by the system.
interface TeamMember {
agentId: string
name: string
isActive: boolean // Indicates whether the teammate is still executing a task
backendType: 'in-process' | 'tmux'
}
interface TeamFile {
teamName: string
createdAt: number
members: TeamMember[] // Persistent registry of team members
}
This structure defines the minimal persistence model for Swarm: a team is not a temporary session, but a recoverable collaboration entity.
The Team File gives team state recovery and observability
The Team File is not a standard configuration file. It is a persistent registry. Even after the Leader restarts, it can still use this file to recover the member list, backend type, and activity state. For engineering systems, this means multi-agent collaboration gains true resume-and-manage capability for the first time.
The isActive field is especially important. It allows the scheduler to make decisions directly from recorded state instead of inferring who is available. This design reduces a complex collaboration problem into a simple state machine.
The mailbox mechanism solves wake-up for cross-process teammates
A Tmux teammate runs in a separate process, so the Leader cannot push tasks into it like a normal function call. To solve this, the system introduces a mailbox file and an inbox poller. The Leader writes a message, the teammate polls and reads it, and then injects the message into its own user message queue.
async function dispatchToTmux(mailbox: string, message: string) {
await fs.promises.writeFile(mailbox, message, 'utf-8') // Write the task into the mailbox
}
async function pollInbox(mailbox: string) {
const msg = await fs.promises.readFile(mailbox, 'utf-8') // Poll and read the message
if (msg) {
return [{ role: 'user', content: msg }] // Inject into the initial message list
}
return []
}
This logic shows that cross-process collaboration depends on explicit message passing rather than shared memory.
In-Process and Tmux represent two very different isolation cost models
An In-Process teammate runs in the same Node.js process as the primary agent and relies on AsyncLocalStorage for context isolation. Its advantages are fast startup, no serialization overhead, and shared UI. Its drawbacks are shared resources, coupled lifecycle behavior, and the inability to keep spawning nested teammates indefinitely.
A Tmux teammate sits at the other end of the spectrum: fully isolated process, separate V8 instance, and independent command-line environment. The tradeoff is slower startup, disk-backed messaging, and no shared cache, but the fault domain is much clearer, which makes it a better fit for heavy or long-running tasks.
AI Visual Insight: The image contrasts same-process teammates with teammates running in independent tmux processes. It highlights the difference between shared-memory execution and file-based message passing, making the engineering tradeoff between low-overhead concurrency and high-isolation concurrency easy to see.
The in-process backend optimizes for minimal scheduling overhead
export class InProcessBackend {
readonly type = 'in-process' as const
async spawn(config: TeammateSpawnConfig) {
const controller = new AbortController() // Create an independent cancellation signal for the teammate
const taskId = registerTask(config.agentId, controller)
startInProcessTeammate(config, controller) // Start the same-process agent loop
return { agentId: config.agentId, taskId, abortController: controller }
}
}
The key point in this code is independent cancellation plus same-process startup. It turns a teammate into a lightweight task unit.
The independent-process backend optimizes for strong isolation and fault boundaries
cd /repo && claude-code \
--agent-id researcher@team \
--agent-name researcher \
--team-name my-team \
--parent-session-id SESSION_ID
This command shows the essence of the tmux path: each teammate is a full CLI process with its own execution environment.
The Coordinator pattern improves multi-agent control through a single convergence point
Claude Code does not use a fully peer-to-peer, democratic agent network. Instead, it chooses centralized scheduling through a Coordinator. The reason is not theoretical elegance, but engineering reliability. Once workers are allowed to talk freely among themselves, the system can quickly fall into looping discussions, conflicting outputs, and difficult-to-debug concurrent states.
The Coordinator model treats all worker results as internal signals rather than direct participants in the conversation. As a result, the user sees only one output channel, decision-making stays with a single scheduler, and system behavior becomes more predictable.
AI Visual Insight: The image depicts a star topology with the Coordinator at the center and multiple workers reporting inward. It emphasizes that user-facing output converges through a single channel, preventing worker-to-worker argument loops or concurrent write conflicts. This is a classic centralized multi-agent orchestration pattern.
The Coordinator’s value comes from global decision-making, not stronger tools
The Coordinator only needs three key tools: AgentTool to create workers, SendMessage to issue follow-up instructions, and TaskStop to terminate tasks. The tool surface is small, but when combined with global context, it is enough to handle parallel decomposition, conflict avoidance, and result aggregation.
async function coordinate() {
const a = AgentTool('researcher') // Create a research worker
const b = AgentTool('implementer') // Create an implementation worker
await Promise.all([a, b]) // Start parallel tasks at the same time
await SendMessage('implementer', 'Modify the code based on the research results') // Secondary scheduling step
}
This pseudocode shows that the Coordinator’s superpower is not a new interface, but the right to orchestrate concurrency.
Worktree and Remote extend isolation from processes to file systems and remote environments
When a task carries a risk of file conflicts, isolation: "worktree" provides Git worktree-level isolation. Each agent modifies code in its own directory view, which avoids directly contaminating the main workspace. Even more important is the cleanup policy: automatically remove the worktree when there are no changes, and preserve it when changes exist.
async function cleanupWorktree(changed: boolean) {
if (!changed) {
await removeWorktree() // Automatically clean up the temporary worktree when nothing changed
return
}
return keepWorktree() // Preserve results when changes exist to avoid losing artifacts
}
This strategy reflects strong engineering pragmatism: save resources by default without sacrificing traceability.
Remote Agents extend execution into bridged environments
isolation: "remote" shows that Claude Code has already reserved a path for remote agent execution. It relies on the Bridge protocol for session creation, message delivery, and file synchronization. In essence, it extends local multi-agent orchestration into remote execution environments and provides infrastructure for high-privilege, heterogeneous compute, or controlled sandbox scenarios.
FAQ
What is the fundamental difference between Swarm and ordinary sub-agents?
Swarm is a persistent team runtime that includes member registration, state tracking, message passing, and team deletion. Ordinary sub-agents behave more like one-off task invocations and lack long-term scheduling and recovery capabilities.
Why is a Coordinator more reliable than letting multiple agents discuss freely?
Because the Coordinator provides a single decision point and a single output channel. It can centrally control concurrency, eliminate conflicts, and avoid endless worker discussions or unclear responsibility boundaries.
When should you choose In-Process, and when should you choose Tmux?
Choose In-Process for short tasks, low latency, and interaction-heavy workflows. Choose Tmux for long-running tasks, strong isolation, and independent crash recovery. If you also need file-level isolation, add worktree on top.
Core Summary: This article reconstructs Claude Code’s multi-agent collaboration design, focusing on the Swarm team runtime, the Coordinator control pattern, the In-Process and Tmux teammate backends, and the worktree/remote isolation strategy to help developers quickly understand its source-level orchestration model.