Build ReAct Agent Systems: From Tool Calling to Closed-Loop Reasoning in Python

This article breaks down a ReAct-based agent example: by combining a unified tool interface, LLM response parsing, and a loop execution mechanism, the model gains a Think → Act → Observe workflow that addresses the real-time and accuracy limitations of standalone large language models. Keywords: ReAct, Agent, Tool Calling.

The technical specification snapshot is straightforward

Parameter Description
Primary Language Python
Core Pattern ReAct (Reasoning + Acting)
Interaction Protocol Text prompts + JSON parameter parsing
Repository Structure Educational single-file example
Star Count Not provided in the original content
Core Dependencies json, typing
Typical Tools Weather lookup, Wikipedia search
Fault Tolerance Points Unknown tools, invalid parameters, parsing failures, iteration limits

ReAct upgrades agents from one-shot answers to closed-loop execution

The key to ReAct is not the answer itself, but the sequence of reason first, then call tools. The model generates a Thought, decides on an Action, receives an Observation, and continues reasoning until it produces a conclusion.

This mechanism directly addresses two common pain points: first, large language models cannot access real-time information on their own; second, complex tasks are hard to decompose reliably in a single pass. For weather, search, calculation, and workflow orchestration tasks, ReAct is a low-cost and highly controllable starting point.

The minimal ReAct state machine is easy to understand

steps = ["Thought", "Action", "Observation"]
for step in steps:
    print(step)  # Execute reasoning, action, and observation in sequence

This snippet simply illustrates the smallest execution unit in ReAct: think first, act next, and then decide what to do based on the result.

The example uses a layered and decoupled agent architecture

The original implementation can be split into three layers: the tool layer, the agent layer, and the model adapter layer. The tool layer encapsulates external capabilities, the agent layer controls the loop, and the model layer only returns text based on the prompt.

The benefit is clear separation of responsibilities. Replacing tools does not affect the main agent flow, and swapping the LLM does not require changes to the tool interface. In engineering practice, this is more stable than pushing all logic into prompt design alone.

The Tool class defines a unified tool protocol

from typing import Callable, Dict, Any

class Tool:
    def __init__(self, name: str, description: str, func: Callable):
        self.name = name
        self.description = description
        self.func = func

    def run(self, parameters: Dict[str, Any]) -> Any:
        return self.func(**parameters)  # Unpack the parameter dictionary and execute the actual tool

This code wraps any function into a standardized tool object so the agent can discover and invoke capabilities in a consistent way.

The response parser determines whether the agent is actually usable

The most important engineering detail in the example is not the tools themselves, but _parse_llm_response. It converts the model’s natural-language output into a structured action instruction.

The original code extracts Thought, Action, and Action Input using string splitting. This method is simple and effective for teaching, but in production it is vulnerable to format drift. A more robust approach is to require the model to output strict JSON.

A more reliable parsing strategy looks like this

import json

def parse_action(text: str) -> dict:
    try:
        data = json.loads(text)  # Prefer standard JSON parsing first
        return data
    except json.JSONDecodeError:
        return {"answer": text, "action": None}  # Fall back to a direct answer when parsing fails

This snippet demonstrates a structured-first, fallback-on-failure parsing strategy that significantly reduces format fragility.

The run loop reveals the control skeleton of ReAct

The run() method maintains history and completes four tasks in each round: build the prompt, call the model, parse the response, and execute the tool. The history keeps growing, so the model can see the outcome of the previous action.

This kind of history-driven loop is essentially a lightweight state machine. It fits single-agent scenarios well and leaves room for future memory, reflection, and planning modules.

The control logic of the main loop can be summarized as follows

for i in range(self.max_iterations):
    prompt = "\n".join(history) + "\nThought:"
    llm_response = self.llm_model(prompt)  # Call the model to generate the next action
    parsed = self._parse_llm_response(llm_response)  # Parse reasoning and action

This code defines the execution skeleton of the agent: every iteration generates the next step from context rather than attempting to produce the final answer in one shot.

The weather and Wikipedia examples validate the generality of tool calling

The example tools include get_weather and search_wikipedia. The first represents real-time information access, and the second represents knowledge retrieval augmentation. Together, they cover the two most common categories of external capabilities used by agents.

From an engineering perspective, this design shows that a Tool does not care about task type. It only cares about input parameters and output results. That means database queries, search engines, and code executors can be integrated later without changing the abstraction.

def get_weather(city: str) -> str:
    weather_data = {
        "北京": "晴朗,25°C",
        "上海": "多云,28°C",
        "广州": "小雨,27°C"
    }
    return weather_data.get(city, "未知天气")  # Return a fallback result when the city is not found

This code uses a static dictionary to simulate a weather service. The point is to show the shape of the tool interface rather than the origin of the data itself.

The original output exposes a common flaw: the system does not produce a final answer

The sample output returns Thought + Action + Parameters, rather than the answer the user actually wants. That means the mock LLM only knows how to initiate tool calls, but it does not summarize a conclusion after receiving the Observation.

This is a common issue in many beginner ReAct implementations: tool calling works, but the final result synthesis step is missing. The fix is to explicitly require a Final Answer after observation in the prompt, or add a termination branch in the loop.

The image shows the process diagram from the original article

Process Diagram AI Visual Insight: This diagram presents the workflow structure of a ReAct agent. It is typically used to show how user input enters the LLM, triggers tool selection, executes external functions, receives observation results, and continues into the next round of reasoning. Mapped to the code in this article, it aligns with the interaction chain among history, llm_model, _parse_llm_response, and Tool.run().

Four capabilities are required to make this example production-ready

First, the output protocol must be structured, ideally with JSON Schema. Second, tool calling needs an allowlist, timeouts, and retries. Third, observation results should be summarized and compressed to prevent the context from growing without bound. Fourth, the final answer must be decoupled from the action trace.

If you keep evolving the system, you can also integrate function-calling protocols, an MCP tool layer, vector retrieval, and long-term memory modules. At that point, ReAct is no longer just a teaching example. It becomes an extensible foundation for agent engineering.

FAQ in structured Q&A format

1. What is the core difference between ReAct and standard prompt-based Q&A?

ReAct explicitly separates thinking from acting and allows the model to call external tools, which makes it better suited for tasks that require real-time data, external computation, or multi-step decision-making.

2. Why can this example call tools but still not count as a complete agent?

Because it lacks the closing step that generates a Final Answer from the observation results. Tool execution is only an intermediate capability. A truly complete agent must also integrate results and end the loop reliably.

3. What should you optimize first in production?

Start with response parsing and tool safety. Unstable formats can distort actions, while missing permission controls on tools can lead to incorrect calls, runaway costs, or even security risks.

Core summary

This article reconstructs a ReAct-based agent example and explains tool abstraction, the reasoning-action-observation loop, response parsing, and fault-tolerance design. It also uses Python to demonstrate two representative scenarios—weather lookup and Wikipedia search—making it a practical starter template for agent engineering.