Leanstral 1.5: Mistral AI's Open Source Proof Engineering Model

Mistral AI has announced the release of Leanstral 1.5, a specialized open-source model designed to advance formal verification in the Lean 4 programming language. Released under the Apache-2.0 license, the model features 6 billion active parameters out of a total 119 billion, balancing computational efficiency with high-level reasoning. Leanstral 1.5 has demonstrated exceptional performance, saturating the miniF2F benchmark and solving 587 out of 672 PutnamBench problems. Beyond theoretical benchmarks, the model has proven its practical utility in agentic proof engineering by identifying five previously unknown bugs in real-world open-source repositories. Trained through a rigorous three-stage process including reinforcement learning with CISPO, Leanstral 1.5 is now available via Hugging Face and a free API, aiming to democratize access to rigorous formal methods for developers and researchers.

Key Takeaways

Open Source Accessibility: Leanstral 1.5 is released under the Apache-2.0 license, featuring 6B active parameters (119B total), making high-tier formal verification tools accessible to the broader community.
State-of-the-Art Performance: The model has achieved record-breaking results, including saturating the miniF2F benchmark and solving 587 of 672 problems on the PutnamBench.
Advanced Training Methodology: The model was developed using a three-stage pipeline: mid-training, supervised fine-tuning (SFT), and reinforcement learning (RL) utilizing the CISPO framework.
Real-World Impact: Leanstral 1.5 successfully identified five previously unknown bugs across 57 tested open-source repositories, demonstrating its capability in practical code verification.
Agentic Proof Engineering: Through a multiturn RL environment, the model interacts with the Lean compiler to refine proofs based on real-time feedback.

In-Depth Analysis

Technical Architecture and Performance Benchmarks

Leanstral 1.5 represents a significant milestone in the evolution of AI-driven formal verification. By utilizing a Mixture-of-Experts (MoE) style architecture with 6 billion active parameters out of a total 119 billion, Mistral AI has created a model that is both powerful and efficient. This architecture allows the model to handle the complex logical structures required for formal proof engineering without the prohibitive computational costs typically associated with massive dense models.

The performance metrics released with Leanstral 1.5 are particularly noteworthy. The model has "saturated" the miniF2F benchmark, a standard for evaluating automated theorem proving. Furthermore, its performance on the PutnamBench—solving 587 out of 672 problems—places it at the forefront of mathematical reasoning AI. In the FATE (Formal Analysis and Theorem Evaluation) benchmarks, Leanstral 1.5 achieved a state-of-the-art score of 87% on FATE-H and 34% on FATE-X. These figures suggest that the model is not only capable of handling academic mathematical proofs but is also increasingly proficient at the more complex, heterogeneous tasks found in formal software verification.

Training Methodology and RL Environments

The development of Leanstral 1.5 followed a sophisticated three-stage training process designed to hone its logical reasoning capabilities. The process began with mid-training, followed by supervised fine-tuning (SFT) to align the model with the syntax and logic of Lean 4. The final and perhaps most critical stage involved reinforcement learning (RL) using the CISPO framework.

Central to this training was the use of two distinct RL environments. In the multiturn environment, the model is presented with a theorem statement and tasked with either proving or disproving it. This environment functions as a closed-loop system: the model submits a proof, receives immediate feedback from the Lean compiler, and uses that feedback to refine its approach in subsequent attempts. This iterative process allows the model to learn from its mistakes and understand the nuances of the Lean 4 compiler, effectively mimicking the workflow of a human proof engineer. This "agentic" approach ensures that the proofs generated are not just statistically likely but are formally correct and compilable.

Practical Application and Bug Discovery

While many formal verification models remain confined to theoretical or academic exercises, Leanstral 1.5 has demonstrated immediate practical utility. Mistral AI tested the model across 57 different repositories to evaluate its performance in real-world code verification. During this process, Leanstral 1.5 uncovered five previously unknown bugs, proving that rigorous formal methods can be effectively applied to existing open-source software to improve reliability and security.

This capability highlights the shift toward "agentic proof engineering," where AI models act as active participants in the software development lifecycle. By verifying complex code properties and identifying logical inconsistencies that traditional testing might miss, Leanstral 1.5 provides a bridge between high-level mathematical reasoning and practical software engineering. The model's availability via Hugging Face and a free API further lowers the barrier to entry, allowing developers to integrate formal verification into their standard workflows.

Industry Impact

The release of Leanstral 1.5 is poised to have a significant impact on the AI and software development industries. By providing a high-performance, open-source tool for formal verification, Mistral AI is challenging the notion that rigorous software proofing is too costly or complex for mainstream use. The Apache-2.0 licensing is a critical factor here, as it allows for widespread adoption and integration into commercial and open-source projects alike.

Furthermore, the success of the CISPO-based reinforcement learning approach provides a blueprint for future models focused on logical reasoning. As AI continues to move beyond simple text generation toward complex problem-solving and code synthesis, the ability to verify the correctness of output through formal methods will become increasingly vital. Leanstral 1.5 sets a new standard for how AI can be used to enhance the security and stability of the global software ecosystem.

Frequently Asked Questions

Question: What makes Leanstral 1.5 different from previous versions?

Leanstral 1.5 introduces a significant performance upgrade through a three-stage training process (mid-training, SFT, and RL with CISPO). It features 6B active parameters and has achieved state-of-the-art results on benchmarks like PutnamBench and FATE, while also demonstrating the ability to find real-world bugs in open-source code.

Question: How does the model use the Lean compiler during training?

During the reinforcement learning phase, the model operates in a multiturn environment. It submits a proof for a given theorem to the Lean compiler, receives feedback on whether the proof compiled or failed, and then uses that feedback to refine and resubmit its proof until it succeeds or the loop ends.

Question: Is Leanstral 1.5 available for public use?

Yes, Leanstral 1.5 is fully open-sourced under the Apache-2.0 license. It is available for download on Hugging Face and can also be accessed through a free API provided by Mistral AI, making it accessible for both research and practical proof engineering in Lean 4.

Mistral AI Unveils Leanstral 1.5: A New Era of Open Source Formal Verification and Proof Engineering