Back to List
TechnologyAIInnovationMachine Learning

Alibaba's New Qwen 3.5 397B-A17 Model Outperforms Trillion-Parameter Predecessor at Significantly Lower Cost, Revolutionizing Enterprise AI Procurement

Alibaba has launched Qwen 3.5, a new open-weight AI model, Qwen3.5-397B-A17B, featuring 397 billion total parameters but activating only 17 billion per token. This model has achieved benchmark victories against Alibaba's previous flagship, Qwen3-Max, which had over one trillion parameters. Qwen 3.5 represents a significant shift for enterprise AI, offering a powerful, runnable, and controllable model that can compete with rental models. Built on the architecture of Qwen3-Next, it scales aggressively with 512 experts, leading to dramatically lower inference latency. The model decodes 19 times faster than Qwen3-Max at 256K context lengths and is claimed to be 60% cheaper to run, while handling concurrent workloads eight times more effectively. These cost and speed efficiencies are crucial for IT leaders evaluating AI infrastructure.

VentureBeat

Alibaba recently unveiled its new open-weight AI model, Qwen 3.5, coinciding with the Lunar New Year. The model, specifically Qwen3.5-397B-A17B, boasts 397 billion total parameters but uniquely activates only 17 billion parameters per token. This architectural design has enabled it to achieve benchmark wins against Alibaba's own previous flagship model, Qwen3-Max, which the company acknowledged had exceeded one trillion parameters.

This release marks a pivotal moment for enterprise AI procurement. For IT leaders planning AI infrastructure for 2026, Qwen 3.5 presents a compelling argument: the ability to own, run, and control a model that can rival the performance of larger, rented models.

The engineering foundation of Qwen 3.5 traces back to Qwen3-Next, an experimental ultra-sparse Mixture-of-Experts (MoE) model previewed last September. Qwen 3.5 significantly scales this architectural direction, increasing from 128 experts in previous Qwen3 MoE models to 512 experts in the new release. This, combined with an improved attention mechanism, results in substantially lower inference latency.

Practically, because only 17 billion of the 397 billion parameters are active during any given forward pass, the compute footprint is much closer to that of a 17B dense model rather than a 400B one. Despite this, the model can leverage the full depth of its expert pool for specialized reasoning tasks.

The speed enhancements are considerable. At 256K context lengths, Qwen 3.5 demonstrates a decoding speed 19 times faster than Qwen3-Max and 7.2 times faster than Qwen 3's 235B-A22B model. Alibaba also claims that Qwen 3.5 is 60% cheaper to operate than its predecessor and possesses eight times greater capacity for managing large concurrent workloads. These figures are highly significant for organizations mindful of their inference expenditures. Furthermore, the model is approximately 1/18th the cost of Google's.

Related News

Superpowers: A Proven Agent Skill Framework and Software Development Methodology for Coding Agents
Technology

Superpowers: A Proven Agent Skill Framework and Software Development Methodology for Coding Agents

Superpowers is presented as an effective agent skill framework and a comprehensive software development methodology. It is designed for coding agents, built upon a foundation of composable 'skills' and a set of initial skills. This framework offers a complete workflow for developing agents, emphasizing a structured approach to agent-based software creation.

OpenViking: An Open-Source Context Database for AI Agents, Designed for Hierarchical Context Management and Self-Evolution
Technology

OpenViking: An Open-Source Context Database for AI Agents, Designed for Hierarchical Context Management and Self-Evolution

OpenViking, an open-source context database developed by volcengine, is specifically designed for AI agents like openclaw. It unifies the management of agent context, including memory, resources, and skills, through a file system paradigm. This innovative approach enables hierarchical context passing and supports the self-evolution of AI agents, streamlining how agents access and utilize necessary information for their operations and development.

dimos: A New Proxy Operating System Built on the Dimensional Framework Emerges on GitHub Trending
Technology

dimos: A New Proxy Operating System Built on the Dimensional Framework Emerges on GitHub Trending

dimos, described as a 'Proxy Operating System' and built upon a 'Dimensional Framework,' has recently appeared on GitHub Trending. Developed by dimensionalOS, this project was published on March 16, 2026. The limited information available suggests it is a foundational system, with its core components rooted in a dimensional architecture, aiming to provide a new approach to operating system design.