Qwen-4 72B Open-Source Multimodal Model Beats GPT-5o on 12 Benchmarks

Alibaba's Qwen-4 72B open-source multimodal model achieves SOTA on 12 benchmarks, rivaling GPT-5o in image and video understanding.

Alibaba's Qwen team has released Qwen-4 72B, an open-source multimodal flagship model that sets new state-of-the-art results across 12 benchmarks, directly challenging GPT-5o in native image and video understanding. This release is a major milestone for the open-source AI community, providing developers with a powerful, accessible alternative to proprietary models. The model's performance on tasks like visual question answering, image captioning, and video comprehension demonstrates that open-source models can compete at the highest level. For overseas developers and technical founders, this means access to cutting-edge multimodal capabilities without vendor lock-in, enabling faster innovation in applications ranging from content moderation to autonomous systems. The model's open-source nature also allows for fine-tuning and customization, making it a versatile tool for commercial and research projects.