Claude Code Architecture Deep Dive: Multi-Agent Orchestration, Tool Governance, and Permission Isolation - Devuly | Smart Analytics for Developers & Projects

[AI Readability Summary] Claude Code is an AI agent runtime built for engineering workflows, not just a chat wrapper. It addresses three hard problems in production agent systems: reliable task decomposition, safe tool execution, and recoverable long-running orchestration. Core themes: multi-agent design, tool governance, and permission isolation.

Table of Contents

The technical specification snapshot highlights its engineering focus

Parameter	Description
Primary language	TypeScript / TSX
Interaction modes	CLI, REPL, SDK
Protocols / interfaces	Anthropic API, MCP
Core capabilities	QueryEngine, Tool Orchestration, Sub-Agents
Engineering dependencies	Commander.js, React, Ink, Zod
Architectural traits	Layered design, streaming execution, permission control, async recovery
Reference popularity	The source article is a CSDN architecture analysis and does not provide repository star counts

Claude Code is fundamentally a controllable agent runtime

From a source-code perspective, Claude Code is not a thin wrapper around “model + function calling.” It is a complete Agent OS. It integrates the main loop, tool system, permission system, state management, subtask orchestration, and terminal experience into a unified runtime.

This design addresses three core pain points: complex tasks are hard to split reliably, tool invocation risk is difficult to control, and long-running tasks often lack a recoverable execution chain. That is why Claude Code looks more like a developer-oriented task execution platform than a conventional chat product.

The system uses a five-layer architecture to isolate responsibilities

Claude Code layered architecture diagram AI Visual Insight: This diagram shows a top-down split across five layers of responsibility. The CLI/UI layer handles user input and rendering, the Query Engine layer manages conversation state and the main loop, the Tool System layer handles tool orchestration and concurrency control, the Agent System layer manages sub-agent spawning and task lifecycle, and the Protocol & Services layer integrates external APIs, MCP, permissions, and context services.

At the top sits the CLI / UI layer, which handles command parsing, terminal rendering, and REPL interaction. In the middle, QueryEngine acts as the brain. It maintains the session lifecycle and drives the loop for model calls and tool execution.

Below that are the Tool System and Agent System. The former defines tool interfaces, concurrency rules, and execution feedback. The latter derives sub-agents, inherits context, and resumes background tasks. At the bottom, the system connects to APIs, MCP, the permission engine, and context compression services.

export async function* runTools(blocks: ToolUseBlock[]) {
  for (const group of partitionToolCalls(blocks)) {
    if (group.isConcurrencySafe) {
      // Execute concurrency-safe tools in parallel to improve throughput
      yield* runToolsConcurrently(group.blocks)
    } else {
      // Execute high-risk or side-effecting tools sequentially to avoid environment conflicts
      yield* runToolsSequentially(group.blocks)
    }
  }
}

This snippet captures the core philosophy of the tool layer: treat “whether a tool can run concurrently” as an engineering rule, not a decision the model improvises at runtime.

QueryEngine connects model reasoning to the real execution environment

The key value of QueryEngine is not a single model invocation. Its real value is maintaining a closed loop of “model output → tool execution → result injection → continued reasoning.” That loop determines whether an agent is truly executable.

At the same time, the system uses AsyncGenerator to stream tokens, tool requests, progress updates, and final answers. That avoids long periods of silence during execution. For CLI products, this kind of “work while reporting progress” interaction model is essential.

The sub-agent system improves stability through role specialization

In Claude Code, an agent can be defined through Markdown configuration. The frontmatter contains metadata such as name, tools, model, permissionMode, maxTurns, and mcpServers, while the body carries the system prompt.

The value of this design is clear: agent identity, capability boundaries, and execution modes become configurable rather than hard-coded. That makes the system easier to extend and easier to audit.

Five built-in sub-agent types create a clear division of labor

The system commonly includes five sub-agent types: GeneralPurpose, Explore, Plan, Guide, and Verification. These are not just prompt variants. They are execution roles bound to differentiated tool permissions and task goals.

Explore focuses on retrieval and read-only analysis. Plan produces implementation strategies only. Verification looks for flaws with a skeptical posture. GeneralPurpose acts as the fallback executor. This division of labor reduces the risk that a single agent accumulates excessive privileges and behaves unpredictably.

Relationship between five sub-agent types and tool capabilities AI Visual Insight: This diagram highlights the mapping between different sub-agents and their tool permissions. It emphasizes that not all agents share the same execution surface. Instead, capabilities are trimmed according to task role, enabling differentiated modes such as read-only exploration, planning output, and background verification.

name: Explore
permissions: read-only
# Disallow spawning and write operations to ensure the scouting agent does not exceed its authority
 disallowedTools:
  - Agent
  - Write
  - Edit

This configuration demonstrates the principle of least privilege for a read-only sub-agent: it can inspect, search, and analyze, but it cannot modify anything.

The tool filtering mechanism is the first gate of security isolation

Claude Code does not expose the full tool surface directly to every agent. It first applies system-level filtering, then applies agent-level declarative filtering. As a result, each agent sees a different tool set.

System-level filtering removes dangerous tools and internally sensitive tools, and it trims the allowlist based on synchronous or asynchronous mode. Agents can further narrow their permission surface through fields like disallowedTools or tools.

export function filterToolsForAgent(tools: Tool[], isAsync: boolean) {
  return tools.filter(tool => {
    if (tool.name.startsWith('mcp__')) return true // MCP tools are integrated through the protocol
    if (ALL_AGENT_DISALLOWED_TOOLS.has(tool.name)) return false // Globally disallowed
    if (isAsync && !ASYNC_AGENT_ALLOWED_TOOLS.has(tool.name)) {
      return false // Async agents can use only allowlisted tools
    }
    return true
  })
}

This logic shows that the permission system is not an after-the-fact audit layer. Isolation happens earlier, at the stage where the tool set is generated.

Multi-agent orchestration supports both synchronous and asynchronous tasks

The primary agent acts like a scheduling center. It decides whether a task should run synchronously or move to the background. Short tasks go to synchronous sub-agents such as Explore or Plan, and results return directly to the main thread. Long-running tasks go to asynchronous sub-agents, which execute in the background and return updates through notifications.

Fork creates a child execution flow with inherited context. Resume restores an existing background agent. SendMessage adds instructions to an active agent. task-notification reports completion state. Together, these mechanisms form a recoverable concurrent runtime.

Multi-agent orchestration and notification feedback diagram AI Visual Insight: This diagram shows the relationship between the primary agent, synchronous sub-agents, asynchronous background agents, and the notification channel. The synchronous path blocks and returns results directly, while the asynchronous path feeds results back through task notifications. Combined with Resume and message delivery, it forms a persistent orchestration loop for long-running tasks.

This orchestration model fits real software engineering workflows

For example, when refactoring a codebase, the primary agent can first dispatch Explore to inspect the code structure, then ask Plan to produce implementation steps, and finally fork a background Verification agent to run tests and perform adversarial checks. The main thread does not need to wait for every task to finish.

That means an agent begins to support a product-grade capability model of “foreground interaction + background execution + result return flow,” rather than just single-turn question answering.

The most reusable design patterns concentrate on the engineering feedback loop

First, streaming execution matters more than one-shot result delivery. Second, tools should be governable components, not unrestricted free functions. Third, the permission system should sit between the model and the environment as a hard boundary. Fourth, complex tasks should be decomposed into sub-agents or skills instead of being forced into a single oversized prompt.

Looking deeper, engineering choices such as Bun, React + Ink, Zod, MCP, and feature flags also show that Claude Code is not trying to prove a concept. It is trying to ship a recoverable, extensible, release-ready developer product.

The practical lessons for building your own agent system are explicit

If you are designing a coding agent, an R&D assistant, or an internal operations agent, the most important lessons are not about prompt wording. They are about runtime structure: how tasks are decomposed, how tools are controlled, how permissions are segmented, how state is restored, and how long-running tasks report completion.

The upper bound of a reliable agent is often determined less by the model and more by the quality of runtime governance. That is the real value Claude Code systematizes.

FAQ

1. What is the biggest difference between Claude Code and a typical function-calling agent?

A typical agent usually solves only one problem: letting the model call tools. Claude Code solves a broader problem: letting the model execute controllably inside a real engineering environment, including concurrency, permissions, recovery, notifications, and state management.

2. Why distinguish between synchronous sub-agents and asynchronous sub-agents?

Because tasks differ in duration and risk. Synchronous execution works well for quick exploration and planning. Asynchronous execution works better for long-running verification, background processing, and tasks that should not block the primary interaction loop. This distinction significantly improves user experience.

3. What capability should a self-built agent system implement first?

A practical priority order is: tool permission governance > task state management > sub-agent orchestration > streaming feedback. Without security boundaries and recovery mechanisms, even a strong model will struggle to operate reliably in production.

Reference

deep-dive-claude-code
Reference material compiled from public architecture and source-code analysis articles about Claude Code

Core Summary: Based on source-code analysis materials related to Claude Code, this article distills its five-layer architecture, multi-agent scheduling, concurrent tool execution, permission filtering, and asynchronous task recovery mechanisms to help developers quickly understand the core design of an industrial-grade AI agent runtime.