Reverse Engineering Baidu Translate Acs-Token: JavaScript Stack Tracing, AES-CBC, and Python Reproduction

This article breaks down the generation mechanism behind Baidu Translate’s Acs-Token parameter. It focuses on DevTools packet inspection, async stack tracing, instrumentation-based log analysis, and a Python reproduction strategy built around AES-CBC and SHA1. It addresses the core challenges of heavily obfuscated frontend code, long parameter construction chains, and the difficulty of locating the actual cryptographic logic.

This reverse engineering path focuses on reproducing Acs-Token

The primary goal is to locate the logic that generates Acs-Token for Baidu Translate endpoints such as https://fanyi.baidu.com/ait/text/translateIncognitoAi and nearby translation APIs, then reproduce verifiable requests on the Python side.

Unlike conventional signatures, this parameter is not a simple hash. It is composed of a timestamp, environment fields, cookies, SHA1-derived values, and AES-CBC ciphertext. The real challenge is not the encryption algorithm itself, but how to identify the true dispatch path inside an asynchronous call stack.

The technical specification snapshot clarifies the target surface

Parameter Value
Target site fanyi.baidu.com
Key endpoints /ait/text/translateIncognitoAi, /ait/text/translate
Key parameter Acs-Token
Primary languages JavaScript, Python
Core protocols HTTPS / JSON / SSE-related headers
Reverse engineering clues DevTools request stack, async call chain, log instrumentation
Identified algorithms AES-CBC, SHA1
Core dependencies curl_cffi, pycryptodome, hashlib
Source platform info Blog post, no project star count provided

The first packet capture step should start from the request stack, not a full-source keyword search

The method is practical: enter any word on the page to trigger a translation request, then inspect the request call stack in DevTools instead of globally searching keywords across a large body of obfuscated code.

DevTools request stack screenshot AI Visual Insight: This image shows the translation request entry point in the browser network panel. Its main value is that it traces the JavaScript call stack directly from the request object, narrowing the search space for parameter generation logic and avoiding blind full-text searches through obfuscated scripts.

After placing a breakpoint at the critical call site and sending the request again, you can see that Acs-Token has already been generated before the request is sent. This indicates that the real analysis target is not the network layer, but the pre-send parameter assembly function.

Breakpoint parameter generation screenshot AI Visual Insight: This image captures the runtime context after a breakpoint is hit. In practice, you can often see the final Acs-Token value directly in local variables or the request header object, which provides decisive evidence about when the parameter is generated.

The real key is identifying dispatch nodes in the async stack

The original analysis emphasizes that this chain has clear asynchronous stack characteristics and contains many repeated execution fragments. If you step through layer by layer mechanically, you can easily get trapped in irrelevant paths. A more efficient strategy is to identify scheduler-style structures such as for-case dispatch logic first.

for-case logic screenshot AI Visual Insight: This image reflects the control-flow dispatch pattern commonly found in obfuscated code, especially in VM-style executors or state-machine schedulers. From a technical perspective, you should focus on case branch IDs, jump variables, and value write-back locations instead of reading every repeated line in sequence.

By placing breakpoints on these dispatch nodes and combining them with step execution, you can reach the function body that actually assembles fields and generates ciphertext much faster.

Breakpoint placement screenshot AI Visual Insight: This image shows how to place breakpoints at control-flow dispatch points. The purpose of these breakpoints is not to extract the algorithm directly, but to determine which case branch eventually writes the signature field.

Step-through stack screenshot AI Visual Insight: This screenshot illustrates the transition from the dispatch layer to the concrete execution function. In reverse engineering practice, the step-through phase should focus on argument pushes, return value write-backs, and closure variable changes.

Enter function screenshot AI Visual Insight: This image shows that the analysis has moved from outer scheduling logic into the target function. At this point, the code usually looks much closer to plaintext concatenation, environment inspection, and cryptographic calls, making it the best stage for instrumentation.

The instrumentation strategy should target result sinks, not every branch

The original article proposes a strategy of instrumenting only special points, and that is critical. With VM-style obfuscation, instrumenting every branch causes the logs to explode almost immediately. A better approach is to watch only statements where the execution result is assigned to something like e.V.

# Pseudocode: print only at critical assignment points to avoid drowning out the real path
if target == "e.V":  # Hit the variable that receives the final result
    print("Before assignment", expr)
    value = run(expr)
    print("After assignment", value)  # Record the final generated value

This snippet demonstrates the idea of instrumenting around the result sink so that fewer logs reveal more critical data flow.

Logs expose the real algorithm combination more clearly than source code

After resuming execution and observing the logs, the original analysis confirms that the generated result contains AES-CBC ciphertext that can be validated externally. At the same time, the plaintext structure also exposes two key fields: d0 and d78.

Here, d0 behaves like a prefixed salt string, while d78 is identified as the integer value obtained by taking the SHA1 digest of an environment-related string, truncating the first four hexadecimal characters, and converting them to an integer. This indicates that Acs-Token is effectively a composite of timestamp + plaintext fields + symmetric encryption + environment digest.

The minimal Acs-Token generation logic can be summarized in three steps

import base64
from hashlib import sha1
from Crypto.Cipher import AES
from Crypto.Util.Padding import pad

def aes_cbc_encrypt_str(plaintext: str, key: bytes, iv: bytes) -> str:
    cipher = AES.new(key, AES.MODE_CBC, iv)
    data = pad(plaintext.encode("utf-8"), AES.block_size)  # Pad to the cipher block size
    return base64.b64encode(cipher.encrypt(data)).decode("utf-8")  # Return Base64 ciphertext

def build_d78(salt: str) -> int:
    return int(sha1(salt.encode()).hexdigest()[:4], 16)  # Take the first 4 hex chars and convert to int

This code corresponds to the two core algorithm families confirmed in the original analysis: AES-CBC generates the ciphertext body, and SHA1 constructs d78.

Python reproduction depends more on environment consistency than on the request body

The original code shows that the process first visits the Baidu homepage to obtain cookies such as BAIDUID, which are then used to construct subsequent parameters. This means Acs-Token is not a purely offline signature. At least some of its dependencies are tied to the session environment.

import time

client_ts = int(time.time() * 1000) + 3001  # Simulate the clientTs offset shown in the original analysis
base_str = format(client_ts, "x")  # Example handling only; the original uses a custom base conversion
salt_head = f"if2glnrf99c{base_str}"
salt = f"{salt_head}___false_0__0"
d78 = int(__import__("hashlib").sha1(salt.encode()).hexdigest()[:4], 16)

This snippet shows that timestamp offset, environment markers, and cookie-bound fields determine whether the reproduction succeeds consistently.

The source code demonstrates the full chain but still needs engineering hardening

The Python example provided in the original article already covers AES-CBC encryption, cookie extraction, Acs-Token assembly, and translation request submission, which makes it a solid minimal sample for validating the core mechanism.

However, two limitations remain important. First, the prefixed timestamp is still marked as not fully understood. Second, language-direction handling, response parsing, and retry logic have not yet been engineered for production. As a result, it is better treated as a research sample than as a production-ready SDK.

The risk boundary for research code must remain explicit

This case is suitable for frontend security research, protocol analysis, and obfuscation study. When sending automated requests to live services, you should comply with the target platform’s terms, applicable laws, and the principle of minimal access. It should not be used for unauthorized bulk invocation.

FAQ structured answers

1. Why inspect the request stack first instead of searching for Acs-Token directly?

Because obfuscated code usually contains many wrapper layers and repeated logic. The request stack provides direct runtime evidence of where the parameter is generated, which is far more efficient than static full-text searching.

2. Why can d78 be identified as a SHA1-related field?

The original analysis used logging and global search to trace its origin, then verified that its value matches the pattern of taking a SHA1 digest over a salt string, truncating the first few hexadecimal characters, and converting them into an integer. That is a common lightweight environment-digest construction method.

3. What is the easiest failure point in Python reproduction?

The most common issues are inconsistent cookies, timestamp offset, user agent and environment fields, or selecting the wrong endpoint path. Identifying the algorithm is only the first step; stable requests still depend on full contextual consistency.

Core Summary: This article reconstructs a JavaScript reverse engineering note and focuses on the Acs-Token generation chain in Baidu Translate. It walks through DevTools packet inspection, async stack tracing, instrumentation logs, AES-CBC and SHA1 identification, and a minimal Python reproduction strategy to help developers quickly understand the construction logic behind this key parameter.