
Managing AI Coding with Agent Evaluation Thinking: Meituan's Practice in Refactoring 310,000 Lines of Code
As AI-generated code now accounts for over 90% of development in certain environments, the primary challenge has shifted from generation speed to the effective management and constraint of AI capabilities. Meituan's technical team recently shared their experience refactoring 310,000 lines of code using a strategy centered on "Agent evaluation thinking." By implementing technical debt assessment, standardized rules, a specialized Refactoring SOP, and a Pre-PR (Pull Request) mechanism, they have successfully transformed large-scale refactoring from a high-cost, periodic project into a continuous, daily operational task. This approach ensures that AI-driven development does not amplify systemic chaos but instead adheres to unified technical standards, maintaining long-term code quality and system stability in an AI-dominated coding era.
Key Takeaways
- Shift in Focus: When AI generates more than 90% of code, the bottleneck is no longer how fast code is written, but how effectively AI is constrained by standards.
- Agent Evaluation Thinking: Meituan utilizes an evaluation-centric approach to manage AI coding, ensuring that automated outputs meet specific quality benchmarks.
- Standardized Mechanisms: The implementation of technical debt sorting, Rule construction, and a Refactoring SOP (Standard Operating Procedure) is essential for maintaining order.
- Pre-PR Integration: A Pre-PR mechanism acts as a critical gatekeeper, allowing refactoring to become a seamless part of the daily iterative development process rather than a standalone effort.
In-Depth Analysis
The Challenge of AI-Generated Chaos
In the current landscape of software engineering, the efficiency of code generation has reached a tipping point. With AI capable of producing over 90% of a system's code, the traditional metrics of developer productivity are being redefined. However, Meituan's technical team points out a significant risk: without a unified framework and strict constraints, AI has the potential to exponentially increase technical debt and systemic chaos. The speed of AI can become a liability if the generated code lacks consistency or fails to adhere to the architectural integrity of the existing codebase. Therefore, the focus of engineering management must transition from facilitating speed to establishing robust constraints that guide AI behavior.
Implementing Agent Evaluation Thinking
To address the complexities of managing AI at scale, Meituan adopted what they term "Agent evaluation thinking." This methodology treats the AI coding assistant as an autonomous agent that requires constant evaluation and guidance. The practice involved refactoring 310,000 lines of code, a task that would be prohibitively expensive and time-consuming using traditional manual methods. By applying this new mindset, the team focused on four core pillars:
- Technical Debt Sorting: Identifying and categorizing existing issues to provide the AI with a clear roadmap of what needs improvement.
- Rule Construction: Establishing a set of non-negotiable technical standards that the AI must follow during the coding and refactoring process.
- Refactoring SOP: Creating a Standard Operating Procedure that defines the step-by-step interaction between human engineers and AI agents during code transformation.
- Pre-PR Mechanism: Introducing a verification layer before code reaches the Pull Request stage, ensuring that AI-generated refactors are validated against the established rules and logic requirements.
From Special Projects to Daily Iterations
One of the most significant outcomes of this practice is the normalization of code refactoring. Historically, refactoring hundreds of thousands of lines of code was viewed as a high-cost, high-risk "special project" that often disrupted regular feature development. By leveraging AI agents within a structured management framework, Meituan has successfully integrated refactoring into the daily workflow. This transition allows for the continuous improvement of the codebase, where technical debt is addressed incrementally during every iteration. This sustainable model ensures that the system evolves healthily alongside new feature additions, preventing the accumulation of unmanageable complexity.
Industry Impact
Meituan's approach signals a major shift in how large-scale technology companies handle the lifecycle of software. As AI agents become the primary authors of code, the role of the human software engineer is evolving into that of a "System Architect" and "AI Controller." The industry is moving toward a future where the quality of a software system is determined by the quality of the constraints and evaluation metrics placed upon AI agents.
Furthermore, this practice demonstrates that the "AI-native" development era requires more than just code completion tools; it requires a comprehensive ecosystem of SOPs and automated gatekeeping mechanisms. By proving that 310,000 lines of code can be refactored through continuous daily actions, Meituan provides a blueprint for other organizations to maintain massive codebases in the age of generative AI, potentially lowering the long-term maintenance costs of complex software systems across the tech industry.
Frequently Asked Questions
Question: Why is speed no longer the most important factor in AI coding?
When AI can generate the vast majority of a system's code, the volume of output is no longer the bottleneck. The primary risk becomes the lack of a unified standard, which can lead to "amplified chaos." Managing the quality and consistency of that output through constraints is now more critical than the speed of generation itself.
Question: What is the purpose of the Pre-PR mechanism in this context?
The Pre-PR mechanism serves as an automated or semi-automated checkpoint that evaluates AI-generated code before it is officially submitted for review. It ensures that the code adheres to the predefined "Rules" and "SOPs," catching errors or inconsistencies early and making refactoring a manageable part of the daily development cycle.
Question: How does "Agent evaluation thinking" change the refactoring process?
It shifts the process from a manual, labor-intensive task to a managed, automated workflow. Instead of humans doing all the heavy lifting, they design the rules and evaluation criteria that the AI (the Agent) must satisfy. This allows for massive tasks, such as refactoring 310,000 lines of code, to be handled continuously and systematically.

