Ornith-1.0: SOTA Self-Improving Models for Agentic Coding

Ornith-1.0 has been introduced as a suite of self-improving open-source models specifically engineered for agentic coding. Developed by deepreinforce-ai, these models range from 9B-Dense to 397B-MoE architectures, post-trained on top of Gemma 4 and Qwen 3.5. By utilizing a Reinforcement Learning (RL) framework that jointly optimizes solution rollouts and the scaffolds that drive them, Ornith-1.0 achieves state-of-the-art performance on major benchmarks like SWE-bench and Terminal-Bench 2.1. The project is released under the MIT license, ensuring global accessibility and freedom from regional limitations. The models demonstrate significant improvements over existing baselines in complex coding tasks, repository-level understanding, and multilingual support, marking a significant advancement for open-source AI agents in the software engineering domain.

Key Takeaways

Diverse Model Architectures: Ornith-1.0 is available in multiple configurations, including 9B-Dense, 31B-Dense, 35B-MoE, and 397B-MoE, catering to different computational needs.
Self-Improving RL Framework: The models utilize Reinforcement Learning to optimize not just the final code solutions (rollouts) but also the underlying scaffolds that guide the generation process.
State-of-the-Art Performance: Ornith-1.0 models outperform comparable open-source models like Qwen 3.5 and Gemma 4 across critical benchmarks such as SWE-bench and Terminal-Bench 2.1.
MIT Licensed: The project is fully open-source, globally accessible, and free from regional restrictions, encouraging widespread adoption and community contribution.
Advanced Post-Training: The models are built upon high-performance foundations, specifically post-trained on top of Gemma 4 and Qwen 3.5 architectures.

In-Depth Analysis

The Self-Improving Training Framework and Scaffold Optimization

At the core of Ornith-1.0's success is its innovative self-improving training framework. Unlike traditional models that focus solely on generating the final output, Ornith-1.0 employs Reinforcement Learning (RL) to learn the generation of both the solution rollouts and the scaffolds that drive those rollouts. This dual optimization approach allows the model to discover superior search trajectories. By jointly refining the scaffold—the structural logic or steps taken to reach a solution—and the resulting code, the model generates higher-quality solutions that are more robust and efficient.

This method addresses a common bottleneck in agentic coding: the quality of the reasoning path. By treating the scaffold as a learnable component, Ornith-1.0 can adapt its internal logic to better handle complex, multi-step coding problems. This results in a model that doesn't just "guess" the code but follows a learned, optimized trajectory to solve repository-level issues.

Benchmarking Performance: 9B and 35B Model Comparisons

The performance of Ornith-1.0 is validated through extensive benchmarking against size-appropriate baselines. In the 9B category, Ornith-1.0-9B shows a clear lead over Qwen3.5-9B and Gemma4-12B. For instance, on the Terminal-Bench 2.1 (Terminus-2) benchmark, Ornith-1.0-9B achieved a score of 43.1, significantly higher than Qwen3.5-9B's 21.3 and Gemma4-12B's 21. On the SWE-bench Verified metric, Ornith-1.0-9B reached 69.4, outperforming Qwen3.5-9B (53.2) and Gemma4-12B (44.2).

Moving to the larger 35B models, Ornith-1.0-35B continues to demonstrate superiority. On the SWE-bench Verified benchmark, it scored 75.6, surpassing Qwen3.5-35B (70.0) and Qwen3.6-35B (73.4). Notably, Ornith-1.0-35B also outperformed the much larger Qwen3.5-397B in several categories, such as NL2Repo (34.6 vs 36.8, showing competitive parity) and significantly in the SWE Atlas metrics. In the SWE Atlas - QnA category, Ornith-1.0-35B scored 37.1, nearly tripling the performance of Qwen3.5-35B (13.2) and nearly doubling the 397B variant (20.4).

These results suggest that the RL-based scaffold optimization provides a significant efficiency boost, allowing smaller Ornith models to compete with or exceed the performance of significantly larger traditional models. The consistency across Terminal-Bench, SWE-bench (Pro and Multilingual), and Claw-eval highlights the model's versatility in handling various programming languages and complex agentic tasks.

Industry Impact

The release of Ornith-1.0 represents a pivotal moment for the open-source AI community, particularly in the niche of agentic coding. By providing models that achieve state-of-the-art results on benchmarks like SWE-bench, deepreinforce-ai is narrowing the gap between open-source and proprietary coding assistants.

The use of the MIT license is particularly significant. It removes barriers to entry for developers and enterprises globally, allowing for the integration of high-performance coding agents into various workflows without the concerns of regional limitations or restrictive licensing fees. This could accelerate the development of autonomous software engineering tools, where AI agents can independently navigate repositories, fix bugs, and implement features.

Furthermore, the focus on "agentic" coding—where the model acts as an agent capable of using terminals and navigating complex file structures—moves the industry beyond simple code completion. Ornith-1.0's ability to optimize its own search trajectories suggests a future where AI models are not static but continuously improve their problem-solving methodologies through specialized training frameworks.

Frequently Asked Questions

Question: What makes Ornith-1.0 different from other coding models?

Ornith-1.0 distinguishes itself through its self-improving RL framework. Instead of just learning to produce code, it learns to optimize the "scaffold" or the search trajectory that leads to the code. This joint optimization results in higher-quality solutions and better performance on complex, multi-step agentic tasks compared to standard post-trained models.

Question: Which base models were used to develop Ornith-1.0?

Ornith-1.0 models are post-trained on top of two primary high-performance architectures: Gemma 4 and Qwen 3.5. This allows the Ornith suite to leverage the foundational strengths of these models while adding specialized agentic coding capabilities through reinforcement learning.

Question: Is Ornith-1.0 free to use for commercial purposes?

Yes. Ornith-1.0 is released under the MIT license. This means it is globally accessible, free from regional limitations, and can be used, modified, and distributed for both private and commercial projects without the restrictions often found in proprietary or more restrictive open-source licenses.

Ornith-1.0: New Open-Source Self-Improving Models Set State-of-the-Art Benchmarks for Agentic Coding Tasks