Spring AI Guide for Enterprise Java: ChatClient, RAG, Tool Calling, and Agent Development - Devuly | Smart Analytics for Developers & Projects

Table of Contents

[AI Readability Summary]

Spring AI gives Java teams a unified way to build AI applications on top of the Spring ecosystem. Instead of wiring model APIs, vector databases, memory, and external tools separately, you use a consistent programming model across them all. This significantly reduces integration cost when you need model portability, Retrieval-Augmented Generation (RAG), tool execution, and agent-style orchestration in enterprise systems.

Technical Specification Snapshot

Parameter	Description
Core Language	Java 17+
Framework Foundation	Spring Boot
Primary Protocols / Interfaces	HTTP API, SSE, MCP
Model Integrations	OpenAI, DashScope, Ollama, and more
Vector Capabilities	PGVector, Milvus, Redis, Chroma
Core Dependencies	`spring-ai-bom`, `ChatClient`, `VectorStore`
Repository Popularity	Star count not provided in the source

Spring AI has become the unified entry point for Java developers building AI applications.

Spring AI is not a single SDK. It is an AI development framework that extends the familiar Spring style. It abstracts model invocation, prompt construction, conversation memory, vector retrieval, and tool execution behind a consistent set of interfaces.

This abstraction solves the most common enterprise pain points: model vendors change frequently, business systems already run on the Spring stack, and AI capabilities must integrate deeply with databases and APIs. Developers only need to learn one programming model.

Spring AI’s capability matrix covers the mainstream enterprise delivery scenarios.

It supports chat completion, embeddings, multimodal workflows, function calling, vector databases, and MCP integration. For Java teams, that means moving from “connecting a model” to “building a complete AI system.”

Capability	Typical Use Cases	Representative Implementations
Chat	Q&A, generation, summarization	OpenAI, Qwen, Claude
Embedding	Retrieval, recall, clustering	OpenAI, DashScope
RAG	Enterprise knowledge base Q&A	PGVector, Milvus
Tool Calling	Invoke external APIs	`@Tool`
Multimodal	Image understanding	GPT-4o, Qwen-VL
Agent	Multi-step decision making	ReactAgent, Graph

Environment setup should start with dependency management and model configuration.

Spring AI recommends using a BOM to unify versions, then importing the starter for your model provider. This avoids dependency conflicts and makes future model switching a configuration change rather than a business code rewrite.


<properties>
    <!-- Unify the Spring AI version to avoid component version drift -->
    <spring-ai.version>1.1.2</spring-ai.version>
</properties>

<dependencyManagement>

<dependencies>

<dependency>

<groupId>org.springframework.ai</groupId>

<artifactId>spring-ai-bom</artifactId>

<version>${spring-ai.version}</version>

<type>pom</type>

<scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>

<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-web</artifactId>
    </dependency>

<dependency>

<groupId>org.springframework.ai</groupId>

<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    </dependency>
</dependencies>

This configuration unifies the Spring AI version and enables OpenAI model capabilities.

Model configuration determines system portability and deployment strategy.

Cloud-hosted models are suitable for stable production rollouts. Local models are better for low-cost experimentation or private deployment. The key advantage of Spring AI is its unified configuration entry point, which makes it easy to switch from OpenAI to DashScope or Ollama.

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o

This configuration defines the OpenAI API key and the default chat model.

# Pull and start a local model for offline experimentation
ollama pull qwen2.5:latest
ollama serve

These commands start an Ollama model service locally.

ChatClient is the recommended conversation entry point because it unifies prompts, streaming output, and extensibility.

Compared with the lower-level ChatModel, ChatClient provides a more business-oriented DSL. It is naturally suited for wrapping system prompts, integrating Advisors, supporting streaming responses, and enabling tool calling.

@RestController
@RequestMapping("/api/ai")
public class ChatController {

    private final ChatClient chatClient;

    public ChatController(ChatClient.Builder builder) {
        this.chatClient = builder
                .defaultSystem("You are a professional Java technical expert") // Set the system role
                .build();
    }

    @GetMapping("/chat")
    public String chat(@RequestParam String message) {
        return chatClient.prompt()
                .user(message) // Inject the user question
                .call()
                .content(); // Return the model text output
    }
}

This code shows a minimal viable Spring AI chat endpoint.

Streaming output fits chat windows and real-time generation scenarios.

SSE is the easiest output format for front-end applications to consume. For long-form generation, code explanation, and customer support responses, streaming significantly reduces perceived wait time.

@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux
<String> stream(@RequestParam String message) {
    return chatClient.prompt()
            .user(message) // Accept user input
            .stream()
            .content(); // Return content incrementally
}

This code returns streaming model output over SSE.

Conversation memory is the key step that moves multi-turn systems from demos to production.

Many customer service bots, copilots, and Q&A systems fail not because the model is weak, but because context management is chaotic. Spring AI uses the Advisor mechanism to inject memory into the invocation chain and avoids manually stitching historical messages together.

@Service
public class ChatMemoryService {

    private final ChatClient chatClient;
    private final ChatMemory chatMemory = new InMemoryChatMemory();

    public ChatMemoryService(ChatClient.Builder builder) {
        this.chatClient = builder
                .defaultAdvisors(new MessageChatMemoryAdvisor(chatMemory)) // Inject the conversation memory advisor
                .build();
    }

    public String chat(String sessionId, String message) {
        return chatClient.prompt()
                .user(message)
                .advisors(spec -> spec.param("chat_memory_conversation_id", sessionId)) // Isolate context by session
                .call()
                .content();
    }
}

This code implements multi-turn context memory based on a session ID.

RAG is the most cost-effective deployment pattern for enterprise knowledge base Q&A.

The core of RAG is not to make the model “smarter.” It is to make the model answer from private enterprise knowledge. By decoupling retrieval from generation, RAG can significantly reduce hallucinations and improve answer controllability.

spring:
  datasource:
    url: jdbc:postgresql://localhost:5432/vectordb
    username: postgres
    password: password
  ai:
    vectorstore:
      pgvector:
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        dimensions: 1536

This configuration connects PGVector and defines the vector index and dimensionality parameters.

@Service
public class RagService {

    @Autowired
    private VectorStore vectorStore;

    @Autowired
    private ChatClient chatClient;

    public String ask(String question) {
        return chatClient.prompt()
                .user(question)
                .advisors(new QuestionAnswerAdvisor(vectorStore)) // Automatically retrieve relevant documents and augment the prompt
                .call()
                .content();
    }
}

This code implements a minimal RAG question-answering pipeline.

Tool Calling gives the model the ability to execute actions in external systems.

If a model can only “talk,” it is still just a Q&A engine. Once it can call APIs, check weather, or trigger business actions, it enters the agentic stage. Spring AI simplifies this step through @Tool.

@Component
public class WeatherTools {

    @Tool(description = "Query the current weather by city name")
    public String getWeather(@ToolParam(description = "City name, for example Beijing") String city) {
        // In a real project, replace this with a third-party weather API
        Map<String, String> weatherMap = Map.of(
                "北京", "Sunny, 25°C, north wind level 2",
                "上海", "Cloudy, 22°C, east wind level 3",
                "深圳", "Showers, 28°C, south wind level 2"
        );
        return weatherMap.getOrDefault(city, "Weather data is currently unavailable");
    }
}

This code defines a weather query tool that the model can call automatically.

Multimodal workflows and MCP extend Spring AI from a chat framework into AI infrastructure.

Multimodal workflows are ideal for receipt recognition, quality inspection, and image-text moderation. MCP enables models to connect to a standardized tool ecosystem and reduces the cost of external tool integration. Together, they allow Spring AI to serve as an enterprise AI gateway.

Public account QR code AI Visual Insight: The image shows a QR code entry for a technical content channel. It is typically used to build a private audience and distribute tutorials and project materials. It does not convey framework architecture information, so it works better as a developer content entry point than as a technical diagram.

WeChat sharing prompt AI Visual Insight: This GIF shows a web content sharing interaction prompt. It indicates that the article’s distribution path depends on social sharing entry points. While it has no direct technical coupling with Spring AI itself, it reflects a common distribution pattern for developer documentation across tools and communities.

@Service
public class MultimodalService {

    private final ChatClient chatClient;

    public MultimodalService(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }

    public String analyzeReceipt(byte[] imageBytes) {
        return chatClient.prompt()
                .user(user -> user
                        .text("Please analyze this receipt image and extract the merchant name, total amount, and itemized list. Return the result in JSON format")
                        .media(MimeTypeUtils.IMAGE_JPEG, new ByteArrayResource(imageBytes)) // Submit text and image together
                )
                .call()
                .content();
    }
}

This code shows how to send an image and text to a multimodal model in the same request.

Agent development is pushing Spring AI toward a complex task orchestration layer.

When a business workflow includes multi-step reasoning, tool collaboration, state transitions, and multi-role coordination, a single prompt is no longer enough. Agent frameworks use workflow and tool orchestration to upgrade AI applications from “Q&A systems” to “execution systems.”

Engineering consistency should be the key criterion when choosing Spring AI in a Java stack.

If your team already uses Spring Boot, Spring Cloud, PostgreSQL, and Redis extensively, Spring AI has a much lower integration cost than introducing a heterogeneous Python stack. It is especially well suited for enterprise knowledge bases, intelligent customer service, internal copilots, and process automation.

FAQ

What is the fundamental difference between Spring AI and calling the OpenAI SDK directly?

Spring AI provides a unified abstraction layer that covers model switching, prompt construction, memory management, RAG, tool calling, and multimodal workflows. It is better suited to long-term enterprise evolution rather than one-off API integration.

Which three modules should a Java team implement first when adopting Spring AI?

Start with basic chat, conversation memory, and RAG. The first two validate the interaction pipeline, while RAG determines knowledge controllability. Once these three are stable, add Tool Calling and Agent capabilities.

Should production environments choose cloud models or local models first?

If you prioritize stability, elasticity, and model quality, choose cloud models first. If you prioritize cost, privacy, and offline deployment, choose Ollama. Local models fit internal network scenarios, while cloud models fit fast production launch.

Core Summary

This article systematically reconstructs the core capabilities and implementation path of Spring AI. It covers environment setup, ChatClient-based conversations, conversation memory, RAG retrieval augmentation, Tool Calling, multimodal workflows, MCP, and agent development, helping Java teams quickly build enterprise-grade AI applications.