Back to List
TechnologyAIInnovationMachine Learning

Alibaba's New Qwen 3.5 397B-A17 Model Outperforms Trillion-Parameter Predecessor at Significantly Lower Cost, Revolutionizing Enterprise AI Procurement

Alibaba has launched Qwen 3.5, a new open-weight AI model, Qwen3.5-397B-A17B, featuring 397 billion total parameters but activating only 17 billion per token. This model has achieved benchmark victories against Alibaba's previous flagship, Qwen3-Max, which had over one trillion parameters. Qwen 3.5 represents a significant shift for enterprise AI, offering a powerful, runnable, and controllable model that can compete with rental models. Built on the architecture of Qwen3-Next, it scales aggressively with 512 experts, leading to dramatically lower inference latency. The model decodes 19 times faster than Qwen3-Max at 256K context lengths and is claimed to be 60% cheaper to run, while handling concurrent workloads eight times more effectively. These cost and speed efficiencies are crucial for IT leaders evaluating AI infrastructure.

VentureBeat

Alibaba recently unveiled its new open-weight AI model, Qwen 3.5, coinciding with the Lunar New Year. The model, specifically Qwen3.5-397B-A17B, boasts 397 billion total parameters but uniquely activates only 17 billion parameters per token. This architectural design has enabled it to achieve benchmark wins against Alibaba's own previous flagship model, Qwen3-Max, which the company acknowledged had exceeded one trillion parameters.

This release marks a pivotal moment for enterprise AI procurement. For IT leaders planning AI infrastructure for 2026, Qwen 3.5 presents a compelling argument: the ability to own, run, and control a model that can rival the performance of larger, rented models.

The engineering foundation of Qwen 3.5 traces back to Qwen3-Next, an experimental ultra-sparse Mixture-of-Experts (MoE) model previewed last September. Qwen 3.5 significantly scales this architectural direction, increasing from 128 experts in previous Qwen3 MoE models to 512 experts in the new release. This, combined with an improved attention mechanism, results in substantially lower inference latency.

Practically, because only 17 billion of the 397 billion parameters are active during any given forward pass, the compute footprint is much closer to that of a 17B dense model rather than a 400B one. Despite this, the model can leverage the full depth of its expert pool for specialized reasoning tasks.

The speed enhancements are considerable. At 256K context lengths, Qwen 3.5 demonstrates a decoding speed 19 times faster than Qwen3-Max and 7.2 times faster than Qwen 3's 235B-A22B model. Alibaba also claims that Qwen 3.5 is 60% cheaper to operate than its predecessor and possesses eight times greater capacity for managing large concurrent workloads. These figures are highly significant for organizations mindful of their inference expenditures. Furthermore, the model is approximately 1/18th the cost of Google's.

Related News

Project N.O.M.A.D: A Self-Sufficient Offline Survival Computer with AI and Essential Tools for Anytime, Anywhere Access
Technology

Project N.O.M.A.D: A Self-Sufficient Offline Survival Computer with AI and Essential Tools for Anytime, Anywhere Access

Project N.O.M.A.D (N.O.M.A.D project) is introduced as a self-sufficient, offline survival computer designed to provide users with critical tools, knowledge, and AI capabilities. This system aims to ensure users can access information and maintain an advantage regardless of their location or connectivity status. The project emphasizes self-reliance and preparedness through its integrated features.

MiroFish: A Concise and Universal Swarm Intelligence Engine for Predicting Everything
Technology

MiroFish: A Concise and Universal Swarm Intelligence Engine for Predicting Everything

MiroFish, an innovative project by 666ghj, has emerged as a trending repository on GitHub. Described as a concise and universal swarm intelligence engine, MiroFish aims to predict a wide array of phenomena. The project's core concept revolves around leveraging collective intelligence to offer predictive capabilities across various domains. Further details regarding its specific applications or underlying technology are not provided in the initial description.

GitNexus: Zero-Server Code Smart Engine Transforms GitHub Repos and ZIP Files into Interactive Knowledge Graphs with Built-in Graph RAG Agent for Enhanced Code Exploration
Technology

GitNexus: Zero-Server Code Smart Engine Transforms GitHub Repos and ZIP Files into Interactive Knowledge Graphs with Built-in Graph RAG Agent for Enhanced Code Exploration

GitNexus is a client-side knowledge graph creator that operates entirely within the browser, requiring no server-side code. Users can input GitHub repositories or ZIP files to generate an interactive knowledge graph, which includes a built-in Graph RAG agent. This tool is designed to significantly enhance code exploration by providing a visual and interactive way to understand codebases.