This hands-on postmortem explores how Claude Code was used to build Herbal Spirit Valley, a pixel-art traditional Chinese medicine game. It focuses on four core stages: requirement clarification, AI-assisted development, map and UI generation, and testing and acceptance. The article distills a reusable method that runs from
/brainstormingto systematic debugging. Keywords: Claude Code, AI game development, testing and acceptance.
The technical specification snapshot establishes the project context
| Parameter | Details |
|---|---|
| Project Type | AI-assisted pixel-art traditional Chinese medicine game development |
| Core Scenes | Herb garden, clinic, outdoor map |
| Primary Languages | Markdown, prompts, likely frontend/game scripting |
| Collaboration Protocol | Human-AI collaboration, skill-driven workflow |
| Repository | tcm_odyssey |
| Stars | Not provided in the source |
| Core Dependencies | Claude Code, superpowers, everything-claude-code, multimodal/image generation models |
This postmortem shows that the hardest part of AI game development is not writing code
The original project scope was ambitious: build a small traditional Chinese medicine game called Herbal Spirit Valley with NPCs, consultation, herb compatibility, decoction, exploration, and farming mechanics. At the current stage, the team has completed the outdoor map, the herb garden, the clinic interior, and shell implementations for several mini-games.
The real bottlenecks were not code generation speed. They were unclear requirements, UI drift, difficulty generating walkable map regions, and a disconnect between testing goals and acceptance criteria. The author’s conclusion is direct: AI amplifies vague requirements, and it can also produce false positives when tests are not rigorous.
The current runnable feature boundary is already well defined
At this stage, the project already supports player movement, two key spaces—the herb garden and the clinic—and basic entry shells for consultation, herb compatibility, and decoction gameplay. The unfinished work is concentrated in NPC dialogue, core tutorial logic, and more granular interactions.
This kind of phased decomposition matters because it determines whether AI should deliver a runnable prototype or a content-complete system. If the boundary is unclear, the model will invent undeclared requirements and create avoidable rework.
project_scope = {
"done": ["外景地图", "药园内景", "诊所内景", "玩家移动", "小游戏壳子"],
"doing": ["NPC对话", "核心问诊逻辑", "知识教学Agent"],
"risk": ["地图遮挡", "场景跳转", "测试验收不完整"]
}
# Core idea: define phase boundaries first, then let AI execute
for phase, items in project_scope.items():
print(phase, items)
This snippet shows how to structure phase scope explicitly so AI does not over-generate inside an ambiguous boundary.
Requirements must be questioned thoroughly before they are executed quickly
The most effective change in this iteration was simple: instead of asking the model to write a plan immediately, the author first used superpowers:brainstorming to challenge the requirements line by line. Its value was not document generation. Its value was uncovering omissions.
Most failures do not come from poor execution. They come from ambiguous inputs. Map style, gameplay pacing, NPC teaching logic, and an extensible clinic structure all need confirmation up front. Otherwise, AI will fill in the blanks on its own.
Clarify first, phase second, and code last is the more reliable path
The author first used brainstorming to clarify requirements, then used superpowers:writing-plans to split the design document into phases. The result was striking: requirement writing took a long time, but the code only took two hours, and phase 1 acceptance went very smoothly.
This shows that in AI development, documentation is not a byproduct. It is an execution constraint. The more precise the document is, the less likely the model is to drift.
workflow = [
"brainstorming", # Ask continuous questions first to close requirement gaps
"writing-plans", # Then break requirements into phased tasks
"implementation", # Finally move into coding and asset generation
"acceptance" # Perform acceptance against explicit standards
]
assert workflow[0] == "brainstorming"
This workflow snippet expresses a key ordering constraint in AI-assisted development: ask first, act second.
A skill-based workflow can significantly reduce the chance of model drift
The author highlighted four high-value skills: search-first, systematic-debugging, interleave thinking, and refactor-cleaner. They address four different classes of problems: solution research, issue remediation, complex reasoning, and workspace contamination.
The most reusable one is search-first. Models tend to build from scratch, but in real development, searching GitHub, npm, MCP resources, or existing skills first often eliminates a large amount of duplicated effort.
Systematic debugging is a better fit for AI than fast patching
The four-step pattern in systematic-debugging is representative: locate the root cause, analyze the pattern, propose and validate a hypothesis, and only then implement the fix. The problem is that if this skill is not invoked explicitly, the model often jumps directly into speculative patching.
That is why the author wrote skill references directly into CLAUDE.md and added an issue logging mechanism. This is especially important in game projects, where similar bugs tend to recur. Without a problem log, AI will repeat ineffective fixes over and over.
def debug_bug(symptom):
root_cause = analyze(symptom) # Identify the root cause first; do not edit code immediately
pattern = inspect(root_cause) # Check whether this is a recurring issue
hypothesis = propose(pattern) # Propose a testable hypothesis
return verify_and_fix(hypothesis) # Validate first, then apply the real fix
This pseudocode captures the full systematic debugging loop: evidence comes before repair.
Passing tests does not mean the product is usable because acceptance criteria must be explicit
A typical failure in the previous iteration looked like this: AI produced a detailed testing plan and even claimed that testing had passed comprehensively, but the webpage rendered as a black screen on load. The problem was not the absence of tests. The problem was that testing goals were not explicitly bound to acceptance criteria.
For example, a UI test goal such as “test the interface” is only a goal. Standards such as “consistent pixel-art style,” “no broken road stitching,” and “no occlusion errors on building boundaries” are actual criteria. Without standards, the model mistakes local correctness for overall correctness.
A test definition that can constrain AI needs at least two layers
The first layer is the goal: smoke testing, functional testing, regression testing, UI testing, and end-to-end testing. The second layer is the standard: pass criteria for each test type, visual checkpoints, and rules that prevent failures from being skipped.
Map validation in particular cannot focus only on a single tile or local background region. It must evaluate the fully stitched scene. Otherwise, you get false positives where every local piece looks right but the complete map fails.
The hardest part of AI-generated game UI is the map, not the art itself
For the town map, the author used a pure AI generation route: first write a long and highly precise prompt that defines the nine-grid layout, river and road system, herb garden, clinic, market, southern fields, and other details, then pass it to an image generation model.
This process reveals an important fact: the so-called lottery effect is often not total model randomness. In many cases, the prompt is still underspecified. The tighter and more complete the description becomes, the more outputs from different models converge.
AI Visual Insight: This animated image shows a basic interaction prototype for the herb garden interior. The scene is clearly partitioned, and visible paths are separated from the farming area, which means the map already provides the spatial foundation for layering in planting, gathering, and NPC tutorial trigger points. The scene looks more like a functional validation interior than final production art.
Walkable map regions are the real hard problem in a pure AI workflow
To let the character move across the map, the author tried three approaches: grid segmentation plus multimodal judgment, image layering, and black-and-white binary masks. The first two were unstable. If the grid was coarse, the information became mixed; if it was fine, the multimodal model could not interpret it reliably. Image layering also failed to produce clean, connected paths.
The third attempt changed direction and asked the image generation model to produce a black-and-white mask layer directly. That approach finally worked. After overlaying the mask with the original image and applying connectivity and threshold rules, the team could derive a walkable region map.
def build_walkable_mask(scene_image, mask_image):
binary_mask = threshold(mask_image, t=0.5) # Convert the mask into a black-and-white binary region map
walkable = extract_white(binary_mask) # Treat white regions as walkable
graph = ensure_connectivity(walkable) # Repair path connectivity to avoid broken routes
return overlay(scene_image, graph)
This snippet summarizes the core processing pipeline from mask generation to a usable walkability map.
AI Visual Insight: This animated image shows that the clinic interior already has room structure, counter layout, and basic character movement, making it suitable for layering in consultation, herb compatibility, and decoction sub-gameplay. The map clearly reserves functional zones, which suggests the author considered tutorial NPCs and multi-room expansion during the map design phase.
Solving the mask problem exposes a higher layer of map engineering issues
The author quickly realized that a walkable mask alone is not enough. A real game also needs occlusion regions, scene transition points, skill trigger points, and more granular collision control. Otherwise, you get immersion-breaking behavior such as characters walking on top of roof eaves.
This shows that AI-generated maps can solve the jump from zero to one, but stable production assets still require a TileMap editor or an equivalent structured map description layer. A pure image-based route reaches its limits quickly once interactions become complex.
Consistent character walk cycles are harder to control than background images
NPC walking animation also caused problems. In theory, it requires four directions—front, back, left, and right—with at least three frames per direction, for roughly twelve frames in total. When these frames are generated directly, common failures include incorrect facing direction, inconsistent leg movement, and distorted head-body proportions.
The workable solution was not to generate the complete frame sequence in one shot. Instead, the team used a reference pose image to constrain motion and generated only the character appearance. In essence, they solved motion consistency and visual identity consistency as two separate problems.
The most reusable outcome of this project is an AI-native development methodology
This article is not just a log of how Claude Code wrote a game. It answers a more important question: once a model has strong coding and multimodal capabilities, how should developers design constraints so that the model becomes reliable productivity?
The answer can be compressed into four points: clarify requirements first, then search for existing solutions; define testing goals first, then bind them to acceptance criteria; use systematic debugging to locate root causes before making changes; and keep the workspace clean to reduce misleading context.
FAQ structured Q&A
Q1: Why is requirement clarification more important than coding in AI game development?
Because the model actively fills in undeclared information. The more ambiguous the requirements are, the more likely AI is to invent details about maps, gameplay, UI, and test scope, which leads to outputs that run but do not match expectations.
Q2: Why can the page still render as a black screen after tests pass?
A common reason is that the tests define goals but not standards, or they validate local pieces without validating the integrated whole. For example, the map may be tested tile by tile but not as a stitched scene, so each part looks correct while the full render fails.
Q3: Can pure AI-generated maps fully replace a TileMap editor?
That is difficult in the short term. AI can quickly produce background art and black-and-white masks, but complex scenes still need a structured description layer to manage collision, occlusion, teleport points, and interaction trigger areas.
Core Summary: This article reconstructs a practical postmortem on building a traditional Chinese medicine pixel-art game with Claude Code. It focuses on the key pitfalls in requirement clarification, skill orchestration, testing and acceptance, AI-generated UI, and map mask creation, and it distills a workflow and quality control method that is better suited to AI-assisted development.