
Managing 310,000 Lines of Code Refactoring: Meituan’s Strategy for AI Coding via Agent Evaluation Thinking
Meituan's technical team has shared a comprehensive case study on refactoring 310,000 lines of code using AI. The core insight is that when AI generates over 90% of a system's code, the primary challenge shifts from development speed to the implementation of effective constraints. Without a unified framework, AI-driven development can lead to significant technical debt and system chaos. Meituan addressed this by adopting an "Agent Evaluation" mindset, focusing on technical debt sorting, rule establishment, a standardized refactoring SOP, and a Pre-PR mechanism. This shift has allowed the team to move away from high-cost, one-off refactoring projects toward a model of continuous, daily iterative improvement, ensuring that code quality remains high even as AI takes over the majority of the writing process.
Key Takeaways
- Constraint over Speed: When AI generates more than 90% of code, the system's success depends on the constraints placed on the AI rather than the speed of generation.
- Agent Evaluation Framework: Meituan utilizes an "Agent Evaluation" mindset to manage AI coding, ensuring that AI outputs align with technical standards.
- Four Pillars of Management: The strategy relies on technical debt sorting, rule construction, a standardized refactoring SOP, and a Pre-PR mechanism.
- Continuous Iteration: The goal is to transform refactoring from a high-cost, specialized project into a sustainable, daily iterative action.
In-Depth Analysis
The Challenge of AI-Generated Code at Scale
As artificial intelligence becomes the primary author of software code—reaching a threshold where over 90% of a system is AI-generated—the fundamental dynamics of software engineering shift. The original news from Meituan highlights a critical paradox: while AI can write code faster than any human, this speed can become a liability. Without unified specifications and strict management, AI does not just create code; it amplifies existing chaos and technical debt at an exponential rate. The focus of technical management must therefore pivot from "how to write faster" to "how to constrain the AI's capabilities" to ensure the resulting system remains maintainable and robust.
Implementing the Agent Evaluation Framework
To manage the refactoring of 310,000 lines of code, Meituan adopted a methodology rooted in "Agent Evaluation." This approach treats the AI as an autonomous agent that requires a structured environment to function correctly. The framework is built upon several key components:
- Technical Debt Sorting: Before AI can effectively refactor, the existing technical debt must be identified and categorized. This provides a roadmap for the AI to follow.
- Rule Construction: Establishing clear, machine-readable rules is essential. These rules act as the boundaries within which the AI operates, preventing the "hallucinations" or stylistic inconsistencies that often plague AI-generated outputs.
- Refactoring SOP (Standard Operating Procedure): By standardizing the steps of refactoring, the team ensures that the AI follows a predictable and verifiable path, reducing the risk of introducing new bugs during the cleanup process.
- Pre-PR Mechanism: A critical gatekeeping step, the Pre-PR (Pull Request) mechanism allows for automated and human checks before AI-generated code is even considered for integration. This ensures that only code meeting the predefined constraints moves forward.
From Special Projects to Daily Iteration
Traditionally, large-scale code refactoring is viewed as a "special project"—a high-cost, time-consuming endeavor that often disrupts regular development cycles. Meituan’s practice demonstrates that with the right AI management tools, refactoring can be integrated into the daily workflow. By using AI to handle the bulk of the labor under strict constraints, the cost of refactoring drops significantly. This allows the process to become a "continuous daily action," where the codebase is constantly being improved and updated alongside regular feature iterations, rather than waiting for technical debt to become unmanageable.
Industry Impact
The methodology shared by Meituan signals a significant shift in the AI industry's approach to software development. As AI coding tools like GitHub Copilot and others become ubiquitous, the industry is moving toward a "Reviewer-Centric" model. In this model, the human developer's role is no longer to write every line of code but to design the constraints and evaluation systems that govern AI agents. This practice of managing 310,000 lines of code serves as a blueprint for other large-scale enterprises looking to harness AI without sacrificing system integrity. It emphasizes that the future of AI in software engineering lies in governance and systematic evaluation rather than just raw generative power.
Frequently Asked Questions
Question: Why is speed no longer the most important metric in AI-driven coding?
When AI can generate code almost instantaneously, the bottleneck is no longer production but quality control. If AI generates code without constraints, it can create massive amounts of technical debt and inconsistent logic very quickly. Therefore, the ability to constrain and guide the AI becomes the more valuable capability for maintaining a healthy system.
Question: What is the purpose of the Pre-PR mechanism in Meituan's refactoring practice?
The Pre-PR mechanism serves as an automated quality gate. It evaluates AI-generated code against established rules and standards before it enters the formal review process. This reduces the burden on human reviewers and ensures that the AI's output is consistent with the project's architectural requirements and coding standards.
Question: How does the "Agent Evaluation" mindset differ from traditional code review?
Traditional code review is often a manual, human-led process focused on individual changes. The "Agent Evaluation" mindset treats the AI as a system that needs to be monitored and tuned. It focuses on creating the rules, SOPs, and automated checks that allow the AI to function as a reliable agent, making the management of code quality a systematic rather than a purely manual task.


