
Managing AI Coding Through Agent Evaluation: A 310,000-Line Code Refactoring Case Study
As AI-generated code begins to account for over 90% of system development, the primary challenge shifts from increasing coding speed to managing and constraining AI output. Meituan's technical team has shared a comprehensive practice involving the refactoring of 310,000 lines of code using an 'Agent evaluation' mindset. By implementing a structured framework—including technical debt sorting, rule construction, standardized operating procedures (SOP), and a Pre-PR (Pull Request) mechanism—the team successfully transitioned code refactoring from a high-cost, specialized project into a sustainable, daily iterative process. This approach addresses the risk of AI-driven development amplifying system chaos and emphasizes the necessity of unified standards in the era of AI-native programming.
Key Takeaways
- Shift in Focus: When AI generates more than 90% of a system's code, the bottleneck is no longer coding speed but the ability to constrain and govern AI behavior.
- Agent Evaluation Logic: Managing AI coding requires an evaluation-centric approach to ensure that automated generation aligns with system architecture and quality standards.
- Four-Pillar Framework: The practice utilizes technical debt sorting, rule construction, refactoring SOPs, and Pre-PR mechanisms to maintain code health.
- Sustainable Refactoring: The methodology transforms refactoring from an expensive, one-time effort into a continuous, low-cost daily activity integrated into the development lifecycle.
- Scale of Practice: The effectiveness of this management strategy was demonstrated through the successful refactoring of 310,000 lines of code.
In-Depth Analysis
The Paradox of AI Coding Speed and System Chaos
In the current landscape of software engineering, the integration of AI has reached a critical threshold where it can generate the vast majority of a system's codebase. However, the Meituan technical team highlights a significant paradox: while AI can write code faster than human developers, this speed can be a double-edged sword. Without a unified set of standards and constraints, AI does not just create code; it amplifies existing chaos.
The core issue identified is that AI, when left unmanaged, lacks the inherent understanding of long-term system maintainability and architectural integrity. When 90% of the code is machine-generated, the system's trajectory is determined not by the speed of production but by the rigor of the constraints placed upon the AI. The challenge for modern engineering teams is to move beyond simply using AI as a productivity tool and instead treat it as a managed agent within a strictly defined governance framework.
Implementing the Agent Evaluation Framework
To manage the complexities of 310,000 lines of code, the team adopted an "Agent evaluation" mindset. This approach treats the AI as an autonomous agent whose outputs must be constantly measured against predefined benchmarks. The management strategy is built on four critical technical components:
- Technical Debt Sorting: Before refactoring can begin, the system must identify and categorize existing technical debt. This provides the AI with a clear map of what needs improvement, preventing the "blind" generation of new code over old, inefficient structures.
- Rule Construction: Establishing a set of explicit rules is essential. These rules act as the boundaries for the AI, ensuring that the generated code adheres to specific architectural patterns, security standards, and performance requirements.
- Refactoring SOP (Standard Operating Procedure): By standardizing the refactoring process, the team ensures consistency. An SOP provides a repeatable workflow that the AI (and the human supervisors) can follow, reducing the likelihood of errors during large-scale code transformations.
- Pre-PR Mechanism: The Pre-PR (Pull Request) mechanism serves as a final gatekeeper. It allows for the automated and manual evaluation of AI-generated refactoring before it is merged into the main codebase, ensuring that only code that meets the established criteria is accepted.
From Special Projects to Daily Iterations
One of the most significant outcomes of this practice is the democratization of code refactoring. Traditionally, refactoring 310,000 lines of code would be viewed as a high-cost, high-risk "special project" that requires dedicated time and resources, often stalling feature development.
By applying Agent evaluation logic and the four-pillar framework, Meituan has demonstrated that refactoring can become a "daily action." Because the constraints are built into the AI management process, the system can continuously identify and fix issues as part of the regular development iteration. This shift significantly reduces the overhead associated with maintaining large-scale systems and ensures that technical debt is addressed incrementally rather than allowed to accumulate to a breaking point.
Industry Impact
The methodology shared by Meituan represents a pivotal shift in how the industry views AI-assisted software development. As AI becomes the primary author of code, the role of the human developer evolves into that of a "System Architect" and "AI Governor."
This practice sets a precedent for "AI-Native Governance," suggesting that the future of software engineering lies in the development of sophisticated evaluation systems that can guide AI agents. For the broader AI industry, this emphasizes that the value of AI in coding is not just in the generation of text, but in the integration of that generation into a controlled, high-quality engineering lifecycle. It provides a blueprint for other organizations to handle massive codebases without succumbing to the "chaos amplification" that unconstrained AI can cause.
Frequently Asked Questions
Question: Why is speed no longer the most important metric in AI-driven coding?
When AI can generate 90% of the code, the volume of output is so high that any lack of quality or consistency is magnified. If the AI writes code that is inconsistent or ignores system architecture, it creates more work for human developers in the long run. Therefore, the ability to constrain the AI and ensure it follows specific rules becomes more valuable than the sheer speed of generation.
Question: How does the Pre-PR mechanism help in managing AI-generated code?
The Pre-PR mechanism acts as a quality control layer. It evaluates the AI's proposed changes against the established rules and SOPs before the code is even submitted for a formal Pull Request. This prevents low-quality or non-compliant code from entering the development pipeline, ensuring that the refactoring process remains stable and predictable.
Question: What does it mean to turn refactoring into a "daily action"?
Traditionally, refactoring is a separate, intensive project. By using AI agents and automated evaluation, the process of cleaning up and improving code becomes so efficient and integrated into the workflow that it happens alongside regular feature updates. This prevents the build-up of technical debt and makes system maintenance a continuous, low-effort process.

