Back to List
Managing AI Coding with Agent Evaluation Strategies: A Practice of Refactoring 310,000 Lines of Code
Industry NewsAI CodingSoftware EngineeringRefactoring

Managing AI Coding with Agent Evaluation Strategies: A Practice of Refactoring 310,000 Lines of Code

The Meituan technical team has shared a comprehensive approach to managing AI-driven development, based on a large-scale project involving the refactoring of 310,000 lines of code. As AI now generates over 90% of code in certain environments, the team argues that the critical factor for system stability is no longer the speed of generation, but the ability to effectively constrain AI capabilities. Without unified standards, AI-generated code can significantly amplify technical chaos. To address this, Meituan implemented an 'Agent evaluation' framework, which includes technical debt assessment, rule construction, standardized operating procedures (SOPs), and a Pre-PR mechanism. This strategy successfully transformed code refactoring from a high-cost, specialized effort into a continuous, daily activity integrated into the standard development lifecycle.

美团技术团队

Key Takeaways

  • Constraint Over Speed: When AI generates more than 90% of a system's code, the primary challenge shifts from how fast code is written to how well the AI's output is constrained and governed.
  • Risk of Amplified Chaos: Without unified standards and strict rules, AI-driven development can lead to an exponential increase in technical debt and system disorder.
  • Agent Evaluation Framework: Meituan utilizes an 'Agent evaluation' mindset to manage AI coding, focusing on technical debt sorting and the establishment of clear rules.
  • Operationalizing Refactoring: By implementing standardized SOPs and a Pre-PR mechanism, the team turned a massive 310,000-line refactoring project into a sustainable, daily iterative process.

In-Depth Analysis

The Shift from Generation Speed to AI Constraint

In the current landscape of software engineering, the role of Artificial Intelligence has evolved to the point where it can generate the vast majority of a codebase—often exceeding 90%. However, the Meituan technical team highlights a critical realization: the speed of AI code generation is no longer the bottleneck or the primary metric for success. Instead, the focus must shift toward the "constraint of AI capabilities."

The core issue identified is that AI, while efficient, lacks inherent adherence to specific project architectures or long-term maintenance standards unless explicitly guided. Without a unified framework or set of constraints, the sheer volume of AI-generated code can amplify existing chaos within a system. The practice suggests that the more code an AI writes, the more vital it becomes to have a robust management system to ensure that the output aligns with the desired system trajectory and quality standards.

The Agent Evaluation Framework for Code Management

To manage the complexities of 310,000 lines of code, the team adopted a methodology rooted in "Agent evaluation thinking." This approach treats the AI coding tool as an autonomous agent that requires constant benchmarking and boundary-setting. The management process is broken down into several key components:

  1. Technical Debt Sorting: Before refactoring can begin, there must be a clear understanding of the existing technical debt. This involves identifying areas where the code deviates from best practices or architectural requirements.
  2. Rule Construction: Establishing a set of "Rules" is essential to provide the AI with the necessary boundaries. These rules act as the primary constraints that prevent the AI from generating disorganized or non-standard code.
  3. Refactoring SOP (Standard Operating Procedure): By creating a standardized procedure for refactoring, the team ensures that the process is repeatable and consistent, regardless of which part of the 310,000-line codebase is being addressed.

Integrating Refactoring into Daily Iterations

A significant breakthrough in this practice is the transition of refactoring from a "high-cost special project" to a "daily action." Traditionally, large-scale refactoring (such as a 310,000-line project) is viewed as a resource-intensive, one-time event that disrupts regular development.

Meituan's approach changes this dynamic through the "Pre-PR (Pull Request) mechanism." By integrating refactoring checks and AI constraints into the Pre-PR stage, the team ensures that code quality is maintained continuously. This mechanism allows refactoring to happen incrementally alongside regular feature iterations. This shift not only reduces the overhead associated with massive refactoring efforts but also ensures that the system remains healthy and manageable as it evolves through AI-assisted development.

Industry Impact

The methodology presented by Meituan offers a blueprint for the future of AI-assisted software engineering. As the industry moves toward a reality where AI handles the bulk of coding tasks, the role of the human developer is shifting toward that of a "system architect" and "rule setter."

The significance of this practice lies in its scalability. By demonstrating that 310,000 lines of code can be managed and refactored through automated constraints and standardized SOPs, it provides a path forward for other large-scale enterprises facing the "chaos amplification" of AI. This approach prioritizes long-term system health over short-term generation metrics, suggesting that the next phase of the AI revolution in coding will be defined by governance and quality control rather than just raw productivity.

Frequently Asked Questions

Question: Why is speed no longer the most important factor in AI coding?

When AI can generate over 90% of the code, the volume of output is so high that the speed of writing is no longer a constraint. The new challenge is ensuring that this massive volume of code is organized, standard-compliant, and maintainable. Without constraints, AI simply produces a larger amount of disorganized code faster.

Question: What is the purpose of the Pre-PR mechanism in this context?

The Pre-PR mechanism is designed to catch issues and enforce rules before code is merged into the main branch. This allows refactoring and quality control to become a continuous, daily part of the development cycle rather than a separate, high-cost project that happens only occasionally.

Question: How does 'Agent evaluation thinking' differ from traditional code review?

'Agent evaluation thinking' treats the AI as an active agent whose performance and output must be measured against specific rules and benchmarks. It focuses on setting the parameters and constraints within which the AI operates, rather than just manually reviewing the final output for errors.

Related News

Meituan LongCat Unveils General 365: A Rigorous New Benchmark for AI Reasoning Capabilities
Industry News

Meituan LongCat Unveils General 365: A Rigorous New Benchmark for AI Reasoning Capabilities

Meituan's LongCat team has officially launched General 365, a new evaluation benchmark designed to set a higher standard for measuring AI reasoning. In a comprehensive test involving 26 mainstream models, the benchmark revealed a significant performance gap in the current AI landscape. Even the industry-leading Gemini 3 Pro achieved only a 62.8% accuracy rate, while the vast majority of tested models failed to reach the 60% threshold. This release by Meituan's technical team highlights the ongoing challenges large language models face in achieving high-level reasoning accuracy and provides a new diagnostic tool for the industry to measure progress beyond simple linguistic fluency.

Meituan BI Architecture Evolution: Leveraging Metric Platforms and Enhanced Computing for Data Consistency
Industry News

Meituan BI Architecture Evolution: Leveraging Metric Platforms and Enhanced Computing for Data Consistency

Meituan's data platform team has introduced a next-generation Business Intelligence (BI) architecture centered on a unified metric platform. By developing core capabilities in automatic semantics and enhanced computing, the team has addressed critical pain points in traditional BI systems, such as inconsistent data logic and slow query speeds. This shift from personalized dataset-driven models to a centralized metric-centric approach marks a significant advancement in Meituan's data processing efficiency and accuracy. The new architecture specifically targets the challenges of data definition confusion and performance bottlenecks, providing a more robust framework for enterprise-level data analysis and decision-making.

The Value of Human Effort: Why Readers Are Gravitating Toward Pre-2022 Books in the Age of AI
Industry News

The Value of Human Effort: Why Readers Are Gravitating Toward Pre-2022 Books in the Age of AI

A growing sentiment among readers suggests a subconscious preference for books published on or before 2022, driven by the perceived value of manual human labor. While Large Language Models (LLMs) have become essential tools for tasks like coding, their influence on the publishing industry has sparked a unique skepticism toward newer works, particularly from unknown authors. The core of this preference lies in the assurance that pre-2022 texts underwent a rigorous, manual process of typing, editing, and proofreading. This reflection highlights a tension between the efficiency of AI tools and the traditional weight given to human-crafted content. As society navigates this technological shift, the industry faces questions about how the 'effort' behind a creative work influences its perceived authority and value in a post-AI world.