Spring AI Alibaba Guide: Integrate Qwen with Spring Boot, Streaming Responses, and Multimodal AI

Spring AI Alibaba gives Java developers a unified abstraction for integrating large language models, reducing the cost of calling model APIs directly, lowering model-switching friction, and improving engineering readiness. This article focuses on quick setup, ChatClient, streaming responses, and multimodal implementation. Keywords: Spring AI, DashScope, ChatClient.

Technical specifications are easy to review at a glance

Parameter Description
Language Java 17+
Framework Spring Boot 3.x
AI Integration Layer Spring AI Alibaba
Model Service Alibaba Cloud Bailian / DashScope / Qwen family
Interaction Protocol HTTP, Streaming Response (Flux)
Typical Capabilities Chat, Structured Output, System Prompt, Multimodal
Core Dependencies spring-ai-alibaba-starter, spring-ai-alibaba-starter-dashscope
Source Article Popularity Original article displayed 421 views

Spring AI Alibaba is a practical LLM integration solution for Java engineering

Spring AI Alibaba is built on top of Spring AI. Its core value is not simply “wrapping another SDK,” but bringing model invocation into Spring’s unified programming model. For Java teams, this means they can continue to reuse dependency injection, configuration management, auto-configuration, and cloud-native integration.

The real pain points it addresses are clear: mainstream AI frameworks such as LangChain and LlamaIndex are centered around the Python ecosystem, while Java developers integrating LLMs into enterprise systems often face fragmented APIs, high model-switching costs, and scattered prompt management.

The framework has clearly defined capability boundaries

It supports chat, text-to-image, audio transcription, and text-to-speech model capabilities, while also supporting synchronous calls, streaming output, structured results, function calling, conversational memory, and RAG extensions. That makes it better suited to enterprise backends rather than simple demo assembly.


<dependency>

<groupId>com.alibaba.cloud.ai</groupId>

<artifactId>spring-ai-alibaba-starter</artifactId>

<version>1.0.0-M5.1</version>
</dependency>

This dependency integrates Spring AI Alibaba into a standard Maven project.

The quick-start process can be reduced to three steps

The first step is to apply for an API key on the Alibaba Cloud Bailian platform. The second step is to create a Spring Boot 3.x project. The third step is to add the AI Starter and complete the configuration, after which you can inject ChatModel or ChatClient directly to make requests.

AI Visual Insight: This screenshot shows the API key application entry on the Alibaba Cloud Bailian platform. It highlights that account verification and callable credentials are prerequisites for model access, forming the identity authentication starting point of the entire invocation chain.

AI Visual Insight: This image shows the initialization screen for creating a new module with Spring Boot. It reflects a standard scaffold-based project setup, which simplifies later integration with auto-configuration, configuration files, and web controllers.

server:
  port: 8082
spring:
  application:
    name: spring-alibaba-demo
  ai:
    dashscope:
      api-key: your-api-key  # Configure the Bailian platform key

This configuration defines the service port, application name, and DashScope authentication parameters.

Using ChatModel is the fastest way to build your first chat endpoint

ChatModel is closer to the underlying capability and is ideal for validating the request pipeline first. By creating a simple controller, you can send an HTTP request parameter directly to the model for processing.

@RestController
@RequestMapping("/ali")
public class AliController {

    private final ChatModel chatModel;

    public AliController(ChatModel chatModel) {
        this.chatModel = chatModel; // Inject the model client
    }

    @RequestMapping("/chat")
    public String chat(String message) {
        return chatModel.call(message); // Call the model directly for a single-turn conversation
    }
}

This code demonstrates the smallest runnable text chat endpoint.

ChatClient provides a higher-level way to orchestrate prompts and results

Compared with ChatModel, ChatClient is better suited to business development. It wraps prompt construction, role messages, and response extraction into a fluent API, which significantly reduces boilerplate code in the controller layer.

@RestController
@RequestMapping("/chat")
public class ChatController {

    private final ChatClient chatClient;

    public ChatController(ChatClient.Builder builder) {
        this.chatClient = builder.build(); // Build the default client from the Builder
    }

    @GetMapping("/call")
    public String call(String input) {
        return this.chatClient.prompt()
                .user(input) // Add the user message
                .call()
                .content(); // Extract the model text output
    }
}

This code shows the higher-level conversational invocation style recommended by Spring AI.

Streaming responses fit chat windows and long-form generation scenarios

When a response is long, waiting for the full result increases time to first token. Using stream() with `Flux

` lets the application return content as it is generated, which is ideal for token-by-token rendering in the frontend. “`java @GetMapping(value = “/stream”, produces = “text/html;charset=utf-8”) public Flux stream(String input) { return this.chatClient.prompt() .user(input) // Inject the current user input .stream() .content(); // Return chunked text as a stream } “` This code implements a server-side streaming output endpoint. ### Structured output makes model results easier to route into business workflows Enterprise systems often do not need “free-form text”. They need structured objects that applications can consume directly. Spring AI supports mapping model output directly to a Java Record or POJO. “`java record ActorFilms(String actor, List movies) { } @GetMapping(“/entity”) public String entity(String actor) { ActorFilms actorFilms = chatClient.prompt() .user(String.format(“Generate the filmography of actor %s”, actor)) // Constrain the output topic .call() .entity(ActorFilms.class); // Deserialize into a structured object return actorFilms.toString(); } “` This code converts the model response directly into a strongly typed object, making it easier to persist or orchestrate downstream. ![](https://i-blog.csdnimg.cn/direct/6a7f63c829174862bef730a3903bd898.png) **AI Visual Insight:** This screenshot shows the execution result of the structured output endpoint. It demonstrates that the model response has been mapped into an object containing an actor name and a movie list, rather than loose natural language text. ## A default System Message moves role definition into the infrastructure layer In most business scenarios, the system role should not be duplicated across every endpoint. A better approach is to configure the default System Prompt during bean initialization so that all requests share a consistent persona and tone. “`java @Configuration public class ChatClientConfig { @Bean ChatClient chatClient(ChatClient.Builder builder) { return builder .defaultSystem(“Assume you are David Ogilvy, capable of writing advertisements for many brands”) // Preset the system role .build(); } } “` This configuration hardcodes the default system prompt as a global client behavior. If you need dynamic variables, you can also reserve parameters in the system template and inject them at call time. This preserves a unified setup while still allowing request-level customization. “`java @GetMapping(“/word”) public String word(String input, String word) { return chatClient.prompt() .system(sp -> sp.param(“word”, word)) // Fill template parameters dynamically .user(input) .call() .content(); } “` This code injects request-specific variables into the default system prompt. ![](https://i-blog.csdnimg.cn/direct/d158906b25644e129b6f0c85cadfd601.png) **AI Visual Insight:** This screenshot reflects the generated result after the default System Message takes effect. It shows that the model output is now constrained by the predefined advertising copywriter persona, demonstrating how prompt engineering can stabilize output style. ## Multimodal support extends Spring AI from text Q&A to image-text understanding The key to multimodal support is not just “uploading an image.” It is packaging text and media together into a unified message model. Spring AI uses the `UserMessage + Media` abstraction to send an image URL, MIME type, and text instruction together to the underlying vision model. When enabling multimodal support, the original article recommends using the dedicated DashScope Starter and configuring a vision model together with `multi-model: true`. “`xml com.alibaba.cloud.ai spring-ai-alibaba-starter-dashscope 1.1.2.0 “` This dependency adds the DashScope starter that supports multimodal capabilities. “`yaml spring: ai: dashscope: api-key: your-api-key chat: options: model: qwen-vl-max-latest # Specify the vision understanding model multi-model: true # Enable multimodal support “` This configuration declares the selected vision model and the multimodal switch. “`java @GetMapping(“/image”) public String image(String input) throws Exception { String url = “https://lcshelter.org/wp-content/uploads/2024/11/lewis-clark-animal-shelter-lewiston-idaho-cat.png”; List mediaList = List.of( new Media(MimeTypeUtils.IMAGE_PNG, new URI(url).toURL().toURI()) ); UserMessage prompt = UserMessage.builder() .text(input) // Text question .media(mediaList) // Image input .build(); return this.chatClient.prompt(new Prompt(prompt)) .call() // Call the vision model .content(); // Return the image understanding result } “` This code implements a multimodal endpoint that lets you ask questions about an image. ![](https://i-blog.csdnimg.cn/direct/eb5d15cd7c7845659e428bd992372d60.png) **AI Visual Insight:** This screenshot shows the return value of the image recognition endpoint. It demonstrates that the model has completed visual understanding based on the input cat image and produced a natural-language judgment about its breed or appearance. ## The practical value of Spring AI Alibaba lies in replaceability and governability From an engineering perspective, the most important outcome is not that a call succeeds once, but that invocation logic becomes standardized. Controllers no longer need to be aware of a specific vendor SDK, and prompts, model selection, and default roles can all be managed through configuration. That greatly reduces application-layer changes when switching the underlying model later. For teams already using Spring Boot 3.x and JDK 17+, this is a low-friction path for building Java-native AI applications. A safer evolution path is to validate the pipeline with `ChatModel`, consolidate capabilities with `ChatClient`, and then extend into structured output and multimodal support. ## FAQ with structured answers ### What is the difference between Spring AI Alibaba and calling Alibaba Cloud model APIs directly? Direct API calls offer more flexibility, but they also create more repetitive engineering work. Spring AI Alibaba provides a unified abstraction, auto-configuration, and prompt orchestration, making it a better fit for projects that require long-term maintenance, model switching, and deep integration with the Spring ecosystem. ### How should you choose between ChatModel and ChatClient? `ChatModel` is lower level and works well for quickly validating the request pipeline or building deep customizations. `ChatClient` is higher level and better suited to business development, especially for prompt composition, streaming output, and structured result extraction. ### What are the most common pitfalls in multimodal integration? The most common issues fall into three categories: selecting the wrong dependency, using a model name that does not support vision, and failing to enable `multi-model: true`. In addition, the image URI must be reachable and the MIME type must be correct, or the model will not be able to read the media content. **AI Readability Summary:** This article reconstructs both beginner and advanced Spring AI Alibaba practices, covering dependency setup, ChatModel and ChatClient usage, streaming responses, structured output, default System Message configuration, and image-based multimodal integration. It helps Java developers quickly build enterprise-grade LLM applications with Spring Boot.