
Managing AI Coding with Agent Evaluation Strategies: A Practice of Refactoring 310,000 Lines of Code
The Meituan technical team has shared a comprehensive approach to managing AI-driven development, based on a large-scale project involving the refactoring of 310,000 lines of code. As AI now generates over 90% of code in certain environments, the team argues that the critical factor for system stability is no longer the speed of generation, but the ability to effectively constrain AI capabilities. Without unified standards, AI-generated code can significantly amplify technical chaos. To address this, Meituan implemented an 'Agent evaluation' framework, which includes technical debt assessment, rule construction, standardized operating procedures (SOPs), and a Pre-PR mechanism. This strategy successfully transformed code refactoring from a high-cost, specialized effort into a continuous, daily activity integrated into the standard development lifecycle.
Key Takeaways
- Constraint Over Speed: When AI generates more than 90% of a system's code, the primary challenge shifts from how fast code is written to how well the AI's output is constrained and governed.
- Risk of Amplified Chaos: Without unified standards and strict rules, AI-driven development can lead to an exponential increase in technical debt and system disorder.
- Agent Evaluation Framework: Meituan utilizes an 'Agent evaluation' mindset to manage AI coding, focusing on technical debt sorting and the establishment of clear rules.
- Operationalizing Refactoring: By implementing standardized SOPs and a Pre-PR mechanism, the team turned a massive 310,000-line refactoring project into a sustainable, daily iterative process.
In-Depth Analysis
The Shift from Generation Speed to AI Constraint
In the current landscape of software engineering, the role of Artificial Intelligence has evolved to the point where it can generate the vast majority of a codebase—often exceeding 90%. However, the Meituan technical team highlights a critical realization: the speed of AI code generation is no longer the bottleneck or the primary metric for success. Instead, the focus must shift toward the "constraint of AI capabilities."
The core issue identified is that AI, while efficient, lacks inherent adherence to specific project architectures or long-term maintenance standards unless explicitly guided. Without a unified framework or set of constraints, the sheer volume of AI-generated code can amplify existing chaos within a system. The practice suggests that the more code an AI writes, the more vital it becomes to have a robust management system to ensure that the output aligns with the desired system trajectory and quality standards.
The Agent Evaluation Framework for Code Management
To manage the complexities of 310,000 lines of code, the team adopted a methodology rooted in "Agent evaluation thinking." This approach treats the AI coding tool as an autonomous agent that requires constant benchmarking and boundary-setting. The management process is broken down into several key components:
- Technical Debt Sorting: Before refactoring can begin, there must be a clear understanding of the existing technical debt. This involves identifying areas where the code deviates from best practices or architectural requirements.
- Rule Construction: Establishing a set of "Rules" is essential to provide the AI with the necessary boundaries. These rules act as the primary constraints that prevent the AI from generating disorganized or non-standard code.
- Refactoring SOP (Standard Operating Procedure): By creating a standardized procedure for refactoring, the team ensures that the process is repeatable and consistent, regardless of which part of the 310,000-line codebase is being addressed.
Integrating Refactoring into Daily Iterations
A significant breakthrough in this practice is the transition of refactoring from a "high-cost special project" to a "daily action." Traditionally, large-scale refactoring (such as a 310,000-line project) is viewed as a resource-intensive, one-time event that disrupts regular development.
Meituan's approach changes this dynamic through the "Pre-PR (Pull Request) mechanism." By integrating refactoring checks and AI constraints into the Pre-PR stage, the team ensures that code quality is maintained continuously. This mechanism allows refactoring to happen incrementally alongside regular feature iterations. This shift not only reduces the overhead associated with massive refactoring efforts but also ensures that the system remains healthy and manageable as it evolves through AI-assisted development.
Industry Impact
The methodology presented by Meituan offers a blueprint for the future of AI-assisted software engineering. As the industry moves toward a reality where AI handles the bulk of coding tasks, the role of the human developer is shifting toward that of a "system architect" and "rule setter."
The significance of this practice lies in its scalability. By demonstrating that 310,000 lines of code can be managed and refactored through automated constraints and standardized SOPs, it provides a path forward for other large-scale enterprises facing the "chaos amplification" of AI. This approach prioritizes long-term system health over short-term generation metrics, suggesting that the next phase of the AI revolution in coding will be defined by governance and quality control rather than just raw productivity.
Frequently Asked Questions
Question: Why is speed no longer the most important factor in AI coding?
When AI can generate over 90% of the code, the volume of output is so high that the speed of writing is no longer a constraint. The new challenge is ensuring that this massive volume of code is organized, standard-compliant, and maintainable. Without constraints, AI simply produces a larger amount of disorganized code faster.
Question: What is the purpose of the Pre-PR mechanism in this context?
The Pre-PR mechanism is designed to catch issues and enforce rules before code is merged into the main branch. This allows refactoring and quality control to become a continuous, daily part of the development cycle rather than a separate, high-cost project that happens only occasionally.
Question: How does 'Agent evaluation thinking' differ from traditional code review?
'Agent evaluation thinking' treats the AI as an active agent whose performance and output must be measured against specific rules and benchmarks. It focuses on setting the parameters and constraints within which the AI operates, rather than just manually reviewing the final output for errors.


