Spring Boot 3 and LangChain4j: A Practical Guide to Building Enterprise AI Applications - Devuly | Smart Analytics for Developers & Projects

Build an enterprise AI customer service system with Spring Boot 3 and LangChain4j. The core capabilities include multi-model integration, conversational memory, RAG retrieval, and tool calling, helping Java teams solve real-world challenges in AI adoption, production integration, and knowledge base operations. Keywords: Spring Boot 3, LangChain4j, RAG.

Table of Contents

Technical Specifications Snapshot

Parameter	Value
Language	Java 21
Framework	Spring Boot 3.2.x
AI Framework	LangChain4j 0.35.0
Model Protocols	OpenAI API, Ollama
Vector Store / Cache	Redis 7.x
Business Database	PostgreSQL 16
Document Parsing	Apache Tika
GitHub Stars	Not provided in the original article
Core Dependencies	langchain4j, langchain4j-open-ai, langchain4j-ollama, langchain4j-redis

This architecture targets real AI adoption scenarios for Java teams

Traditional AI tutorials have long focused on Python. Java teams often get stuck on model integration, knowledge retrieval, session persistence, and business tool orchestration. The value of LangChain4j lies in its JVM-friendly abstractions that unify these capabilities.

This solution uses an intelligent customer service system as its delivery vehicle, combining Spring Boot’s engineering strengths with the power of large language models. The goal is not merely to “get a demo running,” but to build an enterprise AI service that is scalable, auditable, and production-ready.

The technology stack already covers the key dimensions of enterprise deployment

From the original content, the technology choices are highly pragmatic: JDK 21 provides the foundation for virtual threads, Spring Boot 3 handles service orchestration, Redis manages caching and memory storage, PostgreSQL stores business metadata, and Ollama plus OpenAI support both local and cloud deployment paths.

curl https://start.spring.io/starter.zip \
  -d dependencies=web,lombok,data-jpa,postgresql,redis \
  -d javaVersion=21 \
  -d bootVersion=3.2.5 \
  -d baseDir=ai-service \
  -o ai-service.zip

This command quickly initializes a Spring Boot project skeleton suitable for AI services.

Core dependencies define the boundaries of AI capability assembly

LangChain4j is not a single dependency, but a set of capability modules. The core package provides abstractions, open-ai and ollama handle model adapters, redis supports vector- or memory-related capabilities, and Tika parses documents in multiple formats.


<properties>
    <langchain4j.version>0.35.0</langchain4j.version>
</properties>

<dependency>

<groupId>dev.langchain4j</groupId>

<artifactId>langchain4j</artifactId>

<version>${langchain4j.version}</version>
</dependency>

<dependency>

<groupId>dev.langchain4j</groupId>

<artifactId>langchain4j-open-ai</artifactId>

<version>${langchain4j.version}</version>
</dependency>

These dependencies separate the model abstraction layer from vendor-specific adapter layers.

Configuration should support both cloud models and local models

In production, the common requirement is not an either-or choice, but coexistence between two model paths. Cloud models handle high-quality, complex reasoning, while local models support low-sensitivity, low-cost, and offline tasks.

langchain4j:
  open-ai:
    chat-model:
      api-key: ${OPENAI_API_KEY:sk-your-key}
      model-name: gpt-4o
      temperature: 0.7
  ollama:
    chat-model:
      base-url: http://localhost:11434
      model-name: qwen2.5:14b

This configuration enables parallel access to both OpenAI and Ollama within the same service.

The system architecture must solve conversation, retrieval, and tool execution together

The original solution has a clear structure: the Controller layer exposes APIs, the AI Service orchestrates conversations, the Knowledge Base Service handles ingestion and retrieval, and the lower layer connects models with vector storage.

AI Visual Insight: This image is a run-button icon rather than a technical architecture diagram. It does not contain analyzable system structure details and can be treated as a decorative UI element.

Domain model design reflects clear engineering boundaries

ChatSession manages sessions, users, and model types, while KnowledgeDocument manages files, content types, and vectorization status. The advantage is that the chat pipeline and the knowledge pipeline can evolve independently.

@Entity
@Table(name = "chat_sessions")
public class ChatSession {
    @Id
    @GeneratedValue(strategy = GenerationType.UUID)
    private String id;

    @Column(name = "user_id", nullable = false)
    private String userId; // Bind to the business user
}

This entity persists the business primary key for AI sessions and the user association.

The conversation service must support both memory and retrieval augmentation

A truly usable customer service system cannot rely on a single model call. It must include historical context and prioritize knowledge base references when answering technical or product questions. LangChain4j’s AiServices is well suited for this orchestration layer.

ChatMemory chatMemory = MessageWindowChatMemory.builder()
    .id(sessionId)
    .maxMessages(20) // Keep only the latest 20 turns to control token cost
    .chatMemoryStore(memoryStore)
    .build();

CustomerSupportAgent agent = AiServices.builder(CustomerSupportAgent.class)
    .chatLanguageModel(chatModel)
    .chatMemory(chatMemory)
    .contentRetriever(contentRetriever) // Attach RAG retrieval
    .build();

This code assembles the model, memory, and retriever into an AI agent capable of continuous conversation.

The RAG service is the key infrastructure for enterprise knowledge control

The knowledge base pipeline is not complicated, but every step matters: upload documents, parse content, split text into chunks, vectorize, write to storage, and then perform similarity-based recall at query time. The original article uses Apache Tika plus a recursive splitter, which is a mainstream approach.

DocumentSplitter splitter = DocumentSplitters.recursive(
    500,  // Chunk size
    50,   // Overlap size
    new dev.langchain4j.model.openai.OpenAiTokenizer("gpt-4o")
);

EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
    .documentSplitter(splitter)
    .embeddingModel(embeddingModel)
    .embeddingStore(embeddingStore)
    .build();

This code defines the document chunking and vector ingestion strategy, which is central to RAG retrieval quality.

Function calling turns AI from a responder into an executor

The defining line for enterprise AI applications is whether the model can call real business tools. Order lookup, refund requests, and shipment tracking are all good candidates to expose as @Tool methods, allowing the model to invoke them autonomously based on user intent.

@Tool("Query logistics tracking information for an order")
public LogisticsInfo queryLogistics(
    @dev.langchain4j.agent.tool.P("Order number") String orderNo) {
    return new LogisticsInfo("SF Express", "SF1234567890", List.of(
        new TrackNode("2024-05-01 10:30", "Shenzhen", "Package collected")
    ));
}

This tool method wraps external business capabilities as a structured interface that the model can invoke.

The agent gains execution ability only after tool registration

Defining a tool class alone is not enough. You must explicitly inject it during agent construction. Only then can the model automatically choose to invoke it during conversation instead of merely generating verbal suggestions.

CustomerSupportAgent agent = AiServices.builder(CustomerSupportAgent.class)
    .chatLanguageModel(chatModel)
    .chatMemory(chatMemory)
    .contentRetriever(contentRetriever)
    .tools(customerServiceTools) // Register business tools
    .build();

This configuration upgrades AI from a “Q&A model” to an “executable agent.”

API design should support synchronous, streaming, and knowledge ingestion requests

The original solution exposes endpoints for standard chat, streaming chat, document upload, status queries, and session cleanup. This already covers the basic interaction surface of an intelligent customer service system. The API design is concise and easy for frontend or external system integration.

@PostMapping("/chat")
public ResponseEntity<ChatService.ChatResponse> chat(@RequestBody ChatRequest request) {
    String sessionId = request.sessionId() != null ? request.sessionId() : java.util.UUID.randomUUID().toString();
    return ResponseEntity.ok(chatService.chat(sessionId, request.userId(), request.message()));
}

This controller code provides a synchronous chat entry point and automatically fills in the session identifier.

Production optimization must focus on performance, security, and routing

Performance priorities include model warm-up, Redis retrieval optimization, virtual-thread concurrency improvements, and summarization-based compression to reduce token usage. On the security side, you should add sensitive-term filtering, prompt injection detection, and output auditing.

public String sanitizeInput(String input) {
    // 1. Mask sensitive information
    // 2. Detect prompt injection
    // 3. Enforce input length limits
    return input;
}

This filter code shows the minimum viable skeleton for AI input governance.

Multi-model routing directly affects cost and SLA

Not every request deserves the most expensive model. Simple FAQs, local summarization, and structured extraction can go to a local model, while complex reasoning, code generation, and critical decision scenarios should route to a stronger cloud model.

public ChatLanguageModel route(AiTaskType taskType) {
    return switch (taskType) {
        case SIMPLE_QA -> localModel;          // Route simple Q&A to the local model
        case COMPLEX_REASONING -> openAiModel; // Route complex reasoning to the cloud model
        default -> localModel;
    };
}

This routing logic dynamically balances quality, latency, and cost.

The full project structure already provides a solid foundation for extensibility

The project directory should be layered by configuration, controllers, services, tools, entities, and repositories. Externalized prompts, Docker orchestration, and a dedicated router class also indicate that this is not a one-off demo, but a maintainable backend AI service.

FAQ

1. Why should Java teams prioritize LangChain4j?

Because it packages model invocation, RAG, memory, and tool calling in a Java-native way. It integrates directly into the Spring Boot engineering ecosystem and reduces the maintenance cost of working across multiple language stacks.

2. What is the easiest pitfall in enterprise RAG?

The challenge is not whether retrieval works at all, but chunking strategy, metadata design, index updates, and recall quality evaluation. Chunks that are too large hurt precision, while chunks that are too small lose context.

3. What should you add before taking function calling to production?

Add permission checks, idempotency control, audit logs, and human fallback mechanisms. For tools involving refunds, price changes, or privacy-related queries in particular, you must fully separate model decisions from business authorization.

Core Summary: This article reconstructs an enterprise AI application architecture based on Spring Boot 3, LangChain4j, Redis, PostgreSQL, and Ollama/OpenAI. It covers multi-model integration, conversational memory, RAG knowledge bases, function calling, REST APIs, and production optimization essentials.