Luogu P11196 COTS 2021 Sudoku String Encoding Solution: Reversible String Encoding with Completed Sudoku Grids - Devuly | Smart Analytics for Developers & Projects

This problem’s core idea is to reversibly map short strings to valid completed Sudoku grids. The key challenge is to reliably locate one Sudoku solution within a massive solution space while keeping decoding fully reversible. The approach uses “string ranking to integer IDs + diagonal block prefill + DFS enumeration of the remaining cells.” Keywords: Sudoku encoding, reversible mapping, DFS construction.

Table of Contents

The technical specification is straightforward

Parameter	Description
Problem	Luogu P11196 / COTS 2021 Sudoku String Encoding
Implementation Language	C++
Core Methods	Counting-based construction, permutation preprocessing, DFS backtracking
Sudoku Size	9×9
Key Constant	9! = 362880
Character Set	26 lowercase letters
Encoding Length	1 to 15
Core Dependencies	`<bits/stdc++.h>`, `next_permutation`, `__int128`
Original Platform	CNBlogs

This problem is best viewed as a reversible encoding problem over completed Sudoku grids

The essence of the task is not to solve a standard Sudoku, but to build a bidirectional mapping between strings and valid completed Sudoku boards. During encoding, the input is a string and the output is a completed Sudoku grid. During decoding, the input is a completed Sudoku grid and the output is the original string.

The main difficulty is that the mapping must exist, be computable, and remain strictly reversible. If any intermediate step uses an unstable strategy, decoding will fail.

Existence follows from comparing the sizes of the two spaces

The total number of valid completed Sudoku grids is about 6.67×10^21, while the total number of lowercase strings of lengths 1 through 15 is less than 1.75×10^21. So from a counting perspective, the full set of strings can be injected into the set of Sudoku solutions.

This step is critical because it proves that such a mapping can be designed. The remaining task is only to find a construction that is convenient to implement.

__int128 pw[16], sm[16];

void init_count() {
    pw[0] = 1;
    for (int i = 1; i <= 15; ++i) {
        pw[i] = pw[i - 1] * 26;      // Precompute powers of 26
        sm[i] = sm[i - 1] + pw[i];   // Count all strings with length at most i
    }
}

This preprocessing converts a string into an integer inside one continuous ID range.

Converting the string to a lexicographic rank makes the problem easier to handle

Working directly with strings is inconvenient because strings of different lengths do not live in one uniform space. A more natural approach is to assign IDs to all strings in cumulative order, then convert the current string into an integer x.

The original solution uses a base-26 expansion. It treats the string as a digit sequence from a to z, then uses the prefix count sm[n-1] to merge strings of different lengths onto one global numbering axis.

The numbering process essentially maps the string to an integer x

__int128 encode_string_to_rank(const string& s) {
    __int128 x = 0;
    for (int i = (int)s.size() - 1; i >= 0; --i) {
        x = x * 26 + (s[i] - 'a');   // Expand in reverse as a base-26 integer
    }
    return x;
}

This step gives the relative rank of the string among strings of the same length. After that, add the total number of shorter strings to obtain the global rank.

Filling the three diagonal 3×3 blocks first greatly reduces construction difficulty

If you greedily fill all 9 blocks from left to right and top to bottom, each block may have many local choices, but row and column constraints accumulate quickly. That often leads to a state where the early part looks valid but the later part has no solution.

A much more stable strategy is to fill the top-left, center, and bottom-right diagonal blocks first. These three blocks are independent of each other because they do not share rows or columns, so you can choose their permutations separately without creating immediate conflicts.

Precomputing all 9! block permutations is the foundation of the construction

For each 3×3 block, preprocess all permutations of 0...8, for a total of exactly 9!. During encoding, use x mod 9! to choose the current block permutation, then set x /= 9! and continue to the next block.

const int F = 362880;
char p[3][3][F];

void init_blocks() {
    char t[9];
    iota(t, t + 9, 0);
    int k = 0;
    do {
        for (int i = 0; i < 3; ++i)
            for (int j = 0; j < 3; ++j)
                p[i][j][k] = t[i * 3 + j]; // Record one 3×3 block permutation
        ++k;
    } while (next_permutation(t, t + 9));
}

This code compresses all in-block permutations into a random-access index table, making it easy to map integer weights to block states.

After filling the diagonal blocks, the remaining state space is already small enough

The clever observation in the original problem is that after filling the three diagonal blocks, the upper bound of the remaining rank is about

1.75×10^21 / (9!)^3 < 37000

That means the unconsumed part of the ID has already become very small. For the remaining 54 cells, there is no need to design another complicated mixed-radix mapping. You can simply run DFS in a fixed order and search for the x-th valid completion.

Use DFS over the remaining 54 cells to enumerate the x-th solution in order

void dfs(int id) {
    if (id == 54) {
        if (cnt == x) success = true; // Found the x-th valid completion
        ++cnt;
        return;
    }

    int i = ord[id].first, j = ord[id].second;
    int bi = i / 3, bj = j / 3;
    for (int c = 0; c < 9; ++c) {
        if (!(row[i] >> c & 1) && !(col[j] >> c & 1) && !(box[bi][bj] >> c & 1)) {
            row[i] |= 1 << c;         // Mark digit as used in the row
            col[j] |= 1 << c;         // Mark digit as used in the column
            box[bi][bj] |= 1 << c;    // Mark digit as used in the block
            grid[i][j] = c;
            dfs(id + 1);
            if (success) return;
            row[i] ^= 1 << c;         // Backtrack and restore the state
            col[j] ^= 1 << c;
            box[bi][bj] ^= 1 << c;
        }
    }
}

This DFS does not enumerate every solution. It only locates the x-th solution, so it fits naturally into the earlier ranking system.

The decoding process is exactly the inverse of encoding

During decoding, first identify which 9! permutation each of the three diagonal blocks corresponds to, and recover that high-order information. Then keep the entire final board fixed and use the same DFS order to count which completion it is in the remaining solution space.

That lets you reconstruct the integer x, then derive the string length and every character from x. Because encoding and decoding use exactly the same state order, reversibility holds.

Recovering the string in reverse also uses base-26 digit extraction

string decode_rank_to_string(__int128 x, int n) {
    string s(n, 'a');
    for (int i = 0; i < n; ++i) {
        s[i] = char(x % 26 + 'a');   // Recover one character at a time
        x /= 26;
    }
    return s;
}

This final step restores the original string from the integer.

The image and page elements only provide source context and do not affect the algorithm

WeChat share prompt AI Visual Insight: This GIF shows a sharing guidance overlay from the CNBlogs page. It is a site-level interaction prompt and does not contain algorithm flow, state diagrams, or code execution results, so it adds no technical value to understanding the Sudoku encoding scheme.

The engineering considerations in this implementation are very clear

First, the solution uses __int128 to hold large integer ranks and avoid 64-bit overflow. Second, it fixes the traversal order of the non-diagonal region in advance so that encoding and decoding use identical counting rules. Third, it uses bitmasks to maintain row, column, and block states, which keeps the DFS constant factors within an acceptable range.

From an algorithm-design perspective, this problem is not pure search. It is a combined model of counting proof, locally free construction, and globally ordered enumeration. The idea is especially suitable for reversible mapping problems in competitive programming.

FAQ answers the most important implementation questions

Q1: Why not assign a direct lexicographic rank to the entire completed Sudoku grid set?

Directly ranking all Sudoku solution grids would require you to efficiently compute how many valid completions remain under a given prefix. That is extremely complex and expensive to implement. Filling independent diagonal blocks first and then using DFS on the rest is a much more practical compromise.

Q2: Why is brute-force search feasible after the three diagonal blocks are filled?

Because after dividing the total string count by (9!)^3, the upper bound of the remaining rank drops below 37000. At that point, you only need to find the x-th valid completion among the remaining legal completions, so the search space becomes manageable.

Q3: What is the easiest mistake to make in this problem?

The easiest mistake is using inconsistent enumeration orders between encoding and decoding. That includes the order of diagonal block selection, the traversal order of the remaining 54 cells, and the candidate digit order inside DFS. Any mismatch breaks reversibility.

Core Summary: This article reconstructs the core idea behind the problem. It first proves by counting that a mapping from strings to valid completed Sudoku grids must exist, then converts each string into a lexicographic rank, fills the three diagonal 3×3 blocks first, and completes the rest with DFS so that encoding and decoding remain fully reversible.