How to Design High-Quality AI Skills: Scalable Practices from a Single File to a Folder-Based Architecture

[AI Readability Summary]

This article distills a practical design framework for AI Skills: structure must serve content, activation matters more than accumulation, and complexity should be introduced only when needed. It addresses three common failure modes: Skills that do not trigger reliably, rules that fade during execution, and long-session drift. Keywords: AI Skill, SKILL.md, Harness.

The technical specification snapshot

Parameter Details
Topic AI Skill design and engineering practices
Primary medium Markdown / directory-based file structure
Target systems Agent, Claude, Cursor, and other skill systems
Collaboration model Prompt + Context + Harness
Reference project forrestchang/andrej-karpathy-skills
GitHub stars Not provided in the source
Core dependencies SKILL.md, rules/, workflows/, references/

A single file is the best starting point for a Skill, not a complex architecture

Many Skills start by piling on rules, workflows, and references. On the surface, that looks complete. In practice, it creates maintenance overhead. For a static Skill with a narrow theme and no need for ongoing evolution, a single SKILL.md is often the optimal structure.

The value of a single-file Skill is not that it is “simple.” The real value is that the structure matches the information density. If you only have a few principles and one external reference link, splitting into directories just forces the Agent to read more empty scaffolding.

Good rules are not declarations. They are self-checkable test statements.

Typical rules are written as “keep it simple” or “avoid complex design.” These phrases express attitude, but they do not provide an execution standard. A higher-quality format is “principle + test statement,” so the Agent can review its own actions after execution.

## Simplicity First
Write only the minimum code required to solve the current problem. Do not add speculative features.
Check: Would a senior engineer consider this change unnecessarily complex? If yes, simplify it further.

## Precise Changes
Modify only the code directly related to the user's request.
Check: Can every changed line be traced back to the user's request?

This structure turns abstract guidance into verifiable action and significantly reduces the chance that the Agent improvises beyond the request.

Constraining an Agent with counterexamples works better than vague advocacy

The most common Agent failure is not an inability to write code. It is the tendency to “do a little extra.” For example, the user only asks to fix a crash caused by an empty email field, but the Agent also adds username validation, stricter email formatting, and docstring cleanup.

def validate_user(user_data):
    email = user_data.get("email", "")  # Extract only the email field
    if not email or not email.strip():  # Fix only the crash caused by an empty value
        raise ValueError("Email required")

This code handles only the target defect without expanding the scope, which aligns with the minimal-change principle of Surgical Changes.

As topics and workflows grow, the Skill must move to a folder-based structure

Once a Skill needs to support multiple themes, task routing, accumulated knowledge, or team reuse, a single file turns into high-noise context. At that point, the right move is not to keep adding sections. It is to split the Skill by responsibility into folders.

skills/
<name>/
├── SKILL.md
├── rules/
├── workflows/
├── references/
│   └── gotchas.md
├── docs/
└── scripts/

The essence of this structure is layered routing. The entry file handles navigation only, while detailed instructions load by task. This prevents the Agent from reading the entire “operations manual” on every run.

File contents must be separated by function, not grouped by intuition

Use rules/ for constraints that must be followed, workflows/ for ordered steps, and references/ for background knowledge, pitfalls, and supporting material. Do not hide checklists inside rule files, and do not bury hard constraints inside reference documents.

image AI Visual Insight: The image shows the different asset types a Skill folder can contain, including scripts, templates, documents, and configuration files. It emphasizes that the Agent should reuse existing assets instead of regenerating boilerplate, which helps reduce error rates.

image AI Visual Insight: The image highlights the boundaries between rules, workflows, and references. It reinforces the idea that constraints, process, and knowledge should be organized independently so the Agent can retrieve them precisely and read them only when needed.

File size is a warning signal, not a mechanical instruction to split

An oversized file can directly prevent the Agent from reading all critical rules. But once a file exceeds a suggested length, you should not split it mechanically. The correct standard is module cohesion, not whether the file passes an absolute line-count threshold.

image AI Visual Insight: The image presents file-size thresholds and split recommendations. Its core message is to treat file size as a context-risk indicator that triggers evaluation, not as a command for automatic fragmentation.

Stable Skill execution depends on three elements working together

A genuinely usable Skill is not just a well-written prompt. It is a closed loop between Prompt, Context, and Harness. They define behavior, control visible information, and enforce validation and correction.

Prompt defines what to do and when to trigger

The most important piece of Skill metadata is not the main body. It is the description. That field determines whether the model activates the Skill at all. If the description is too narrow, the trigger rate drops. If it is too vague, routing becomes noisy.

name: docx-writer
description: >
  Activate this Skill whenever the user mentions .docx, Word documents, formal reports, or template-based output.

This configuration broadens coverage for real user phrasing and helps the model avoid missing valid trigger opportunities because of keyword mismatch.

Context controls the exposure level of information

A strong Skill does not dump all knowledge into context at once. It uses progressive disclosure. The top level contains information that is always required, referenced files contain scenario-specific knowledge, and task execution reads additional material on demand.

image AI Visual Insight: The image shows a three-level progressive loading mechanism: persistent rules stay at the top level, while cloud-platform-specific or scenario-specific knowledge is pushed into reference files. This reduces context congestion and improves routing accuracy.

Harness ensures rules do not decay in long conversations

Many Skills work in the first turn and then “forget” by the third. The root cause is usually not model capability. It is the absence of a mandatory reread mechanism. After long-session compression, SKILL.md is often no longer present in context.

## Session Discipline
Reread SKILL.md for every new task and rematch it against Common Tasks.
Check: Does the file read in this run fully match the corresponding route? If not, roll back immediately and restart the routing process.

This rule turns “acting from memory” into a violation and suppresses task drift at the process level.

image AI Visual Insight: The image illustrates a three-layer mandatory reread mechanism. It shows that under session compression, task switching, or rule loss, critical constraints should be redundantly injected through the shell layer, routing layer, and workflow layer.

image AI Visual Insight: The image shows an “excuse rejection table” distilled from real failure cases. It is designed to intercept common self-justifications from Agents, such as “I probably do not need to reread it this time.”

image AI Visual Insight: The image explains the responsibility boundaries and coordination model of Prompt, Context, and Harness. It emphasizes that all three are required; otherwise, the Skill remains a paper specification rather than a stable system.

SKILL.md should be a navigation hub, not a complete knowledge base

A high-quality SKILL.md must stay short and answer only two questions: what should be read now, and which tasks require which files. A practical structure is to keep four core sections: Always Read, Session Discipline, Common Tasks, and Known Gotchas.

## Always Read
1. rules/project-rules.md
2. rules/coding-standards.md

## Common Tasks
- Fix bug → read rules/*.md + workflows/fix-bug.md
- Add Controller → read rules/backend-rules.md + workflows/add-controller.md
- Other / unlisted task → fallback to Always Read

This navigation-first style turns the Skill from an encyclopedia into a task router, improving activation reliability and execution consistency.

image AI Visual Insight: The image presents criteria for validating description quality. It emphasizes that trigger conditions must cover real user phrasing, task boundaries, and keyword combinations. Otherwise, the Skill effectively does not exist at the retrieval layer.

FAQ structured Q&A

Q1: When should you keep a single-file Skill, and when should you split it into directories?

A: Use a single file when the topic is narrow, the workflow is fixed, and the Skill does not need ongoing knowledge accumulation. Move to a directory-based structure when you need multiple themes, task routing, team collaboration, or continuously evolving knowledge.

Q2: Why are many Skills still hard to use even when they contain a lot of content?

A: The problem is usually not lack of content. It is incorrect structure. When rules, workflows, and reference material are mixed together, the Agent cannot read accurately, remember consistently, or route stably.

Q3: What is the first priority for improving Skill stability?

A: Fix the description and routing first, then add Session Discipline and Harness. Without reliable activation and mandatory rereads, even the best rules remain static text.

Core takeaway: This article reframes AI Skill design around three questions: when to use a single file, when to split into a folder-based structure, and how to improve activation, stability, and maintainability through Prompt, Context, and Harness. It is especially useful for building reusable, routable, and verifiable Agent skill systems.