OpenClaw Self-Improving with Reflection: Layered Memory and Reflection for Continual Agent Improvement

This zero-dependency self-improvement skill for Agents enables continual optimization across tasks through local Markdown-based persistent memory, captured user corrections, and post-task self-reflection. It addresses recurring mistakes, poor knowledge retention, and unauditable rule accumulation. Keywords: OpenClaw, layered memory, self-reflection.

The technical specification snapshot summarizes the skill at a glance

Parameter Description
Project Name self-improving-with-reflection
Ecosystem OpenClaw Skills
Version 1.2.11
Implementation Language Markdown + file system conventions
Storage Protocol Local file read/write
Dependencies Zero dependencies
Memory Architecture HOT / WARM / COLD
Key Files memory.md, corrections.md, index.md
Stars Original data not provided
Core Value Enables an Agent to accumulate reusable execution experience from corrections and reflection

The skill’s core goal is to give Agents auditable continual learning

self-improving-with-reflection is not a traditional vector memory store, and it is not a black-box reasoning plugin. Instead, it turns the loop of correction, abstraction, improvement, and archival into a local file system that is readable, editable, and exportable.

Compared with approaches that only record conversation history, this skill focuses more on accumulating execution quality. The Agent does not just remember facts. It also remembers which practices work better, which expressions users tend to correct, and which workflows deserve reuse.

The three-tier memory structure balances performance and maintainability

./self-improving/
├── memory.md         # HOT: always loaded at startup, ≤100 lines
├── index.md          # Memory index and maintenance status
├── corrections.md    # Recent correction log
├── projects/         # WARM: project-level knowledge
├── domains/          # WARM: domain-level knowledge
└── archive/          # COLD: historical archive

This directory layout separates high-frequency rules, contextual knowledge, and historical traces into different storage locations, preventing a single file from growing without bound.

The layered memory architecture separates short-term high-frequency rules from long-term knowledge

The HOT layer lives in memory.md. It is best suited for confirmed user preferences, stable working patterns, and correction rules that appeared frequently within the last seven days. Because the system must always load it, its size must remain tightly constrained.

The WARM layer supports on-demand retrieval and is divided into projects/ and domains/. The former stores project-specific override rules, while the latter stores domain-specific conventions. This allows the same Agent to switch behavior automatically when working across different repositories.

Namespace inheritance rules reduce the cost of rule conflicts

global (memory.md)
↓
domain (domains/*.md)
↓
project (projects/*.md)

The core precedence is project > domain > global. If a conflict exists at the same level, the most recently modified rule wins. If the conflict still cannot be resolved, the Agent should ask the user for clarification. This strategy works especially well in multi-project and multi-team environments.

The correction detection mechanism turns natural language feedback into executable rules

The system listens for explicit correction phrases such as “No, that’s not right” or “I prefer X, not Y.” These signals are appended to corrections.md, where the system then evaluates whether they can be abstracted into stable patterns.

Once the same type of pattern appears three times, it graduates from tentative experience into a HOT rule. This prevents one-off overfitting while still promoting genuinely frequent preferences into primary memory.

## 2026-04-26 - [23:10] Changed X → Y
Type: format
Context: API documentation output format correction
Confirmed: pending (2/3)
Source: user explicit correction

This log format converts vague feedback into structured records that support auditing, counting, and later promotion.

The self-reflection mechanism lets Agents improve without relying only on user corrections

This skill does not just accept external corrections. It also requires proactive review after multi-step tasks, bug fixes, and negative feedback. Reflection entries usually contain three parts: context, issue, and improvement plan.

That means the Agent learns more than a single answer. It learns why the result was poor and how to avoid the same failure next time. This is the key step that upgrades memory from a passive record system into an execution improvement system.

The reflection template stays minimal and supports later compression

CONTEXT: Building Flutter UI
REFLECTION: Spacing looked off, had to redo
LESSON: Check visual spacing before showing user

The value of this template lies in its low-friction capture model while preserving enough information density to remain useful.

Lifecycle management rules prevent the memory system from growing out of control

The skill includes built-in migration rules across HOT, WARM, and COLD layers. Frequently reused patterns within seven days stay in HOT. Patterns unused for 30 days move to WARM. Patterns unused for 90 days are archived into COLD.

Maintenance operations include weekly compression of similar entries, updating index.md, and automatically archiving stale rules. This is not just storage. It is a self-cleaning knowledge cache.

# Inspect the memory directory and read the core files
ls -la ./self-improving/
cat ./self-improving/index.md   # View the index and maintenance status
cat ./self-improving/memory.md  # Load HOT memory
find ./self-improving/domains ./self-improving/projects -maxdepth 1 -type f -name "*.md" | sort

These commands initialize the memory loading workflow: read HOT first, then enumerate context files that can be loaded on demand.

The file format design emphasizes both human readability and machine compatibility

memory.md is divided into Confirmed Preferences, Active Patterns, and Recent so the system can distinguish never-decaying preferences from still-maturing rules. corrections.md focuses on timestamp, type, context, and confirmation state. index.md records the size of each layer and the most recent compression time.

The advantage of this format is simple: the Agent can parse it directly, and developers can edit it manually without needing database tooling.

A minimal memory.md template looks like this

## Confirmed Preferences
- Prefer summarizing parameters in tables when generating technical documentation

## Active Patterns
- List the execution plan first when handling multi-step tasks

## Recent (last 7 days)
- The user recently corrected capitalization of English technical terms multiple times

This template shows how HOT memory separates stable rules, active patterns, and recent signals that may still need confirmation.

Compared with other memory approaches, its strengths are transparency, low cost, and strict constraints

Traditional Agent memory systems often depend on external databases or vector indexes, which makes deployment more complex and auditing more difficult. This skill is completely file-system-based, making it a strong fit for local-first, privacy-sensitive, or manually reviewed environments.

More importantly, it explicitly forbids storing credentials, financial information, medical data, third-party private data, or location patterns, and it supports forget everything as a reset switch.

Integration with SOUL.md and AGENTS.md completes the operational loop

The recommended integration model is straightforward: in SOUL.md, require the Agent to load HOT memory and related memory files before non-trivial tasks; in AGENTS.md, distinguish factual memory from execution-improvement memory; and in HEARTBEAT.md, periodically check promotion and archival status.

## Self-Improving Check
- [ ] Review corrections.md for patterns ready to graduate
- [ ] Check memory.md line count (should be ≤100)
- [ ] Archive patterns unused >90 days

This checklist institutionalizes maintenance work and prevents the memory system from being left unmanaged over time.

FAQ structured common questions

1. What fundamentally distinguishes this skill from ordinary chat history memory?

Answer: Chat history stores what was said, while this skill stores which practices work better. It functions more like an execution strategy library than a conversation replay log.

2. Why split memory into HOT, WARM, and COLD layers?

Answer: Keeping all memory resident slows loading and adds noise. With layering, high-frequency rules stay resident, project knowledge loads on demand, and old patterns move into archives, improving both efficiency and maintainability.

3. Which Agent scenarios is it best suited for?

Answer: It is a strong fit for coding assistants, personal productivity Agents, documentation generation Agents, and long-running task systems that need stable output style and process quality across sessions.

Core Summary: This article reconstructs and analyzes OpenClaw’s self-improving-with-reflection skill, focusing on its zero-dependency file system, HOT/WARM/COLD layered memory, correction detection, self-reflection, lifecycle management, and integration patterns for AGENTS.md, SOUL.md, and HEARTBEAT.md.