Build an enterprise AI customer service system with Spring Boot 3 and LangChain4j. The core capabilities include multi-model integration, conversational memory, RAG retrieval, and tool calling, helping Java teams solve real-world challenges in AI adoption, production integration, and knowledge base operations. Keywords: Spring Boot 3, LangChain4j, RAG.
Technical Specifications Snapshot
| Parameter | Value |
|---|---|
| Language | Java 21 |
| Framework | Spring Boot 3.2.x |
| AI Framework | LangChain4j 0.35.0 |
| Model Protocols | OpenAI API, Ollama |
| Vector Store / Cache | Redis 7.x |
| Business Database | PostgreSQL 16 |
| Document Parsing | Apache Tika |
| GitHub Stars | Not provided in the original article |
| Core Dependencies | langchain4j, langchain4j-open-ai, langchain4j-ollama, langchain4j-redis |
This architecture targets real AI adoption scenarios for Java teams
Traditional AI tutorials have long focused on Python. Java teams often get stuck on model integration, knowledge retrieval, session persistence, and business tool orchestration. The value of LangChain4j lies in its JVM-friendly abstractions that unify these capabilities.
This solution uses an intelligent customer service system as its delivery vehicle, combining Spring Boot’s engineering strengths with the power of large language models. The goal is not merely to “get a demo running,” but to build an enterprise AI service that is scalable, auditable, and production-ready.
The technology stack already covers the key dimensions of enterprise deployment
From the original content, the technology choices are highly pragmatic: JDK 21 provides the foundation for virtual threads, Spring Boot 3 handles service orchestration, Redis manages caching and memory storage, PostgreSQL stores business metadata, and Ollama plus OpenAI support both local and cloud deployment paths.
curl https://start.spring.io/starter.zip \
-d dependencies=web,lombok,data-jpa,postgresql,redis \
-d javaVersion=21 \
-d bootVersion=3.2.5 \
-d baseDir=ai-service \
-o ai-service.zip
This command quickly initializes a Spring Boot project skeleton suitable for AI services.
Core dependencies define the boundaries of AI capability assembly
LangChain4j is not a single dependency, but a set of capability modules. The core package provides abstractions, open-ai and ollama handle model adapters, redis supports vector- or memory-related capabilities, and Tika parses documents in multiple formats.
<properties>
<langchain4j.version>0.35.0</langchain4j.version>
</properties>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j</artifactId>
<version>${langchain4j.version}</version>
</dependency>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-open-ai</artifactId>
<version>${langchain4j.version}</version>
</dependency>
These dependencies separate the model abstraction layer from vendor-specific adapter layers.
Configuration should support both cloud models and local models
In production, the common requirement is not an either-or choice, but coexistence between two model paths. Cloud models handle high-quality, complex reasoning, while local models support low-sensitivity, low-cost, and offline tasks.
langchain4j:
open-ai:
chat-model:
api-key: ${OPENAI_API_KEY:sk-your-key}
model-name: gpt-4o
temperature: 0.7
ollama:
chat-model:
base-url: http://localhost:11434
model-name: qwen2.5:14b
This configuration enables parallel access to both OpenAI and Ollama within the same service.
The system architecture must solve conversation, retrieval, and tool execution together
The original solution has a clear structure: the Controller layer exposes APIs, the AI Service orchestrates conversations, the Knowledge Base Service handles ingestion and retrieval, and the lower layer connects models with vector storage.
AI Visual Insight: This image is a run-button icon rather than a technical architecture diagram. It does not contain analyzable system structure details and can be treated as a decorative UI element.
Domain model design reflects clear engineering boundaries
ChatSession manages sessions, users, and model types, while KnowledgeDocument manages files, content types, and vectorization status. The advantage is that the chat pipeline and the knowledge pipeline can evolve independently.
@Entity
@Table(name = "chat_sessions")
public class ChatSession {
@Id
@GeneratedValue(strategy = GenerationType.UUID)
private String id;
@Column(name = "user_id", nullable = false)
private String userId; // Bind to the business user
}
This entity persists the business primary key for AI sessions and the user association.
The conversation service must support both memory and retrieval augmentation
A truly usable customer service system cannot rely on a single model call. It must include historical context and prioritize knowledge base references when answering technical or product questions. LangChain4j’s AiServices is well suited for this orchestration layer.
ChatMemory chatMemory = MessageWindowChatMemory.builder()
.id(sessionId)
.maxMessages(20) // Keep only the latest 20 turns to control token cost
.chatMemoryStore(memoryStore)
.build();
CustomerSupportAgent agent = AiServices.builder(CustomerSupportAgent.class)
.chatLanguageModel(chatModel)
.chatMemory(chatMemory)
.contentRetriever(contentRetriever) // Attach RAG retrieval
.build();
This code assembles the model, memory, and retriever into an AI agent capable of continuous conversation.
The RAG service is the key infrastructure for enterprise knowledge control
The knowledge base pipeline is not complicated, but every step matters: upload documents, parse content, split text into chunks, vectorize, write to storage, and then perform similarity-based recall at query time. The original article uses Apache Tika plus a recursive splitter, which is a mainstream approach.
DocumentSplitter splitter = DocumentSplitters.recursive(
500, // Chunk size
50, // Overlap size
new dev.langchain4j.model.openai.OpenAiTokenizer("gpt-4o")
);
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
.documentSplitter(splitter)
.embeddingModel(embeddingModel)
.embeddingStore(embeddingStore)
.build();
This code defines the document chunking and vector ingestion strategy, which is central to RAG retrieval quality.
Function calling turns AI from a responder into an executor
The defining line for enterprise AI applications is whether the model can call real business tools. Order lookup, refund requests, and shipment tracking are all good candidates to expose as @Tool methods, allowing the model to invoke them autonomously based on user intent.
@Tool("Query logistics tracking information for an order")
public LogisticsInfo queryLogistics(
@dev.langchain4j.agent.tool.P("Order number") String orderNo) {
return new LogisticsInfo("SF Express", "SF1234567890", List.of(
new TrackNode("2024-05-01 10:30", "Shenzhen", "Package collected")
));
}
This tool method wraps external business capabilities as a structured interface that the model can invoke.
The agent gains execution ability only after tool registration
Defining a tool class alone is not enough. You must explicitly inject it during agent construction. Only then can the model automatically choose to invoke it during conversation instead of merely generating verbal suggestions.
CustomerSupportAgent agent = AiServices.builder(CustomerSupportAgent.class)
.chatLanguageModel(chatModel)
.chatMemory(chatMemory)
.contentRetriever(contentRetriever)
.tools(customerServiceTools) // Register business tools
.build();
This configuration upgrades AI from a “Q&A model” to an “executable agent.”
API design should support synchronous, streaming, and knowledge ingestion requests
The original solution exposes endpoints for standard chat, streaming chat, document upload, status queries, and session cleanup. This already covers the basic interaction surface of an intelligent customer service system. The API design is concise and easy for frontend or external system integration.
@PostMapping("/chat")
public ResponseEntity<ChatService.ChatResponse> chat(@RequestBody ChatRequest request) {
String sessionId = request.sessionId() != null ? request.sessionId() : java.util.UUID.randomUUID().toString();
return ResponseEntity.ok(chatService.chat(sessionId, request.userId(), request.message()));
}
This controller code provides a synchronous chat entry point and automatically fills in the session identifier.
Production optimization must focus on performance, security, and routing
Performance priorities include model warm-up, Redis retrieval optimization, virtual-thread concurrency improvements, and summarization-based compression to reduce token usage. On the security side, you should add sensitive-term filtering, prompt injection detection, and output auditing.
public String sanitizeInput(String input) {
// 1. Mask sensitive information
// 2. Detect prompt injection
// 3. Enforce input length limits
return input;
}
This filter code shows the minimum viable skeleton for AI input governance.
Multi-model routing directly affects cost and SLA
Not every request deserves the most expensive model. Simple FAQs, local summarization, and structured extraction can go to a local model, while complex reasoning, code generation, and critical decision scenarios should route to a stronger cloud model.
public ChatLanguageModel route(AiTaskType taskType) {
return switch (taskType) {
case SIMPLE_QA -> localModel; // Route simple Q&A to the local model
case COMPLEX_REASONING -> openAiModel; // Route complex reasoning to the cloud model
default -> localModel;
};
}
This routing logic dynamically balances quality, latency, and cost.
The full project structure already provides a solid foundation for extensibility
The project directory should be layered by configuration, controllers, services, tools, entities, and repositories. Externalized prompts, Docker orchestration, and a dedicated router class also indicate that this is not a one-off demo, but a maintainable backend AI service.
FAQ
1. Why should Java teams prioritize LangChain4j?
Because it packages model invocation, RAG, memory, and tool calling in a Java-native way. It integrates directly into the Spring Boot engineering ecosystem and reduces the maintenance cost of working across multiple language stacks.
2. What is the easiest pitfall in enterprise RAG?
The challenge is not whether retrieval works at all, but chunking strategy, metadata design, index updates, and recall quality evaluation. Chunks that are too large hurt precision, while chunks that are too small lose context.
3. What should you add before taking function calling to production?
Add permission checks, idempotency control, audit logs, and human fallback mechanisms. For tools involving refunds, price changes, or privacy-related queries in particular, you must fully separate model decisions from business authorization.
Core Summary: This article reconstructs an enterprise AI application architecture based on Spring Boot 3, LangChain4j, Redis, PostgreSQL, and Ollama/OpenAI. It covers multi-model integration, conversational memory, RAG knowledge bases, function calling, REST APIs, and production optimization essentials.