Back to List
Managing AI Coding Through Agent Evaluation: Lessons from Meituan’s 310,000-Line Code Refactoring Project
Industry NewsAI CodingRefactoringSoftware Engineering

Managing AI Coding Through Agent Evaluation: Lessons from Meituan’s 310,000-Line Code Refactoring Project

The Meituan technical team has introduced a novel approach to managing AI-driven software development by applying Agent evaluation logic to large-scale code refactoring. With AI now capable of generating over 90% of code, the team argues that the primary challenge has shifted from generation speed to the implementation of effective constraints. Without unified standards, AI risks amplifying technical chaos. By refactoring 310,000 lines of code, Meituan demonstrated a framework involving technical debt sorting, rule construction, a standardized Refactoring SOP, and a Pre-PR mechanism. This system transforms high-cost refactoring projects into continuous, daily iterative actions. The practice highlights the necessity of moving beyond simple code generation toward a structured management model that ensures long-term system maintainability in an AI-centric development environment.

美团技术团队

Key Takeaways

  • Constraint Over Speed: In an environment where AI generates more than 90% of the code, the ability to constrain and guide the AI is more critical than the speed of code production.
  • Agent Evaluation Logic: Meituan utilizes an "Agent evaluation" mindset to manage AI coding, ensuring that the AI's output aligns with specific technical standards and architectural requirements.
  • Systematic Framework: The management approach is built on four pillars: technical debt sorting, rule construction, a standardized Refactoring SOP, and a Pre-PR (Pull Request) mechanism.
  • Continuous Integration: The methodology successfully transitions code refactoring from a high-cost, periodic "special project" into a sustainable, daily iterative process integrated into the standard development lifecycle.

In-Depth Analysis

The Challenge of AI-Generated Chaos

As AI tools become the primary authors of software code—reaching a threshold where over 90% of a system's codebase may be AI-generated—the technical landscape undergoes a fundamental shift. The Meituan technical team points out that while AI significantly accelerates the development process, it also possesses the potential to "成倍放大混乱" (multiply and amplify chaos) if left unconstrained. When multiple AI agents or human-AI collaborations produce code without a unified set of standards, the resulting technical debt can accumulate at an unprecedented rate. The core issue identified is that the bottleneck in modern software engineering is no longer how fast code can be written, but how effectively the resulting system can be governed and maintained.

The Agent Evaluation Management Framework

To address the risks of unconstrained AI coding, Meituan implemented a strategy based on "Agent evaluation thinking." This approach treats the AI coder as an autonomous agent that must be measured and restricted by a rigorous set of benchmarks. The practice, applied to a massive project involving 310,000 lines of code, relies on several key components:

  1. Technical Debt Sorting: Before refactoring can begin, the system must identify and categorize existing technical debt. This provides a roadmap for the AI to understand which areas of the codebase require the most attention.
  2. Rule Construction: Establishing clear, machine-readable rules is essential. These rules act as the boundaries within which the AI must operate, ensuring that generated code follows specific architectural and stylistic guidelines.
  3. Refactoring SOP (Standard Operating Procedure): By standardizing the steps required for refactoring, the team ensures consistency across different modules and iterations. This SOP guides the AI through the complex process of updating legacy code without introducing new regressions.
  4. Pre-PR Mechanism: The Pre-PR (Pull Request) mechanism serves as a final gatekeeper. It allows for the automated and manual review of AI-generated changes before they are merged into the main codebase, ensuring that every modification meets the established quality bars.

From Special Projects to Daily Iterations

One of the most significant outcomes of Meituan’s practice is the transformation of the refactoring process itself. Traditionally, large-scale refactoring (such as a 310,000-line project) is viewed as a high-cost, high-risk "special project" that requires dedicated time and resources. However, by leveraging AI under a structured management framework, Meituan has successfully integrated refactoring into the daily development flow. This shift allows for the continuous improvement of code quality, where technical debt is addressed incrementally during every iteration rather than being allowed to build up until it requires a massive, disruptive intervention.

Industry Impact

The methodology shared by Meituan provides a blueprint for the future of AI-assisted software engineering. As the industry moves toward "AI-native" development, the focus must shift from the tools of generation to the tools of management and evaluation. Meituan's success in refactoring 310,000 lines of code suggests that the role of the human developer is evolving into that of a "system architect" and "rule setter," who defines the constraints within which AI agents operate. This approach not only mitigates the risks of AI-driven technical debt but also sets a new standard for how large-scale enterprise systems can maintain agility and health in the age of automated programming.

Frequently Asked Questions

Question: Why is AI-generated code considered a risk for "amplifying chaos"?

AI can generate code much faster than humans can review it. Without a unified framework or strict rules, different AI prompts or models might produce inconsistent patterns, redundant logic, or architectural violations, leading to a rapid and disorganized accumulation of technical debt.

Question: What is the significance of the Pre-PR mechanism in AI coding?

The Pre-PR mechanism acts as a critical quality control layer. It ensures that AI-generated refactoring or new code is automatically validated against the project's rules and standards before it ever reaches the human review stage or the main code repository, reducing the burden on human developers and maintaining system integrity.

Question: How does Meituan's approach change the cost of code refactoring?

By using AI guided by SOPs and evaluation rules, refactoring becomes a continuous, automated, or semi-automated task. This removes the need for expensive, dedicated refactoring phases, making code maintenance a low-cost, integrated part of the daily development cycle.

Related News

Meituan BI Architecture Evolution: Leveraging Metric Platforms and Enhanced Computing for Data Consistency
Industry News

Meituan BI Architecture Evolution: Leveraging Metric Platforms and Enhanced Computing for Data Consistency

Meituan's Data Platform team has unveiled a new generation of Business Intelligence (BI) architecture centered on a unified Metric Platform. By developing two core capabilities—Automatic Semantics and Enhanced Computing—the team addresses critical challenges inherent in traditional BI systems. These challenges include inconsistent data definitions, often described as 'data caliber confusion,' and suboptimal query performance resulting from the proliferation of personalized datasets. This strategic shift aims to streamline data analysis workflows, ensuring that metrics remain consistent across the organization while maintaining high-performance data retrieval and processing capabilities.

Meituan LongCat Open Sources General 365: A New Benchmark Revealing the Reasoning Limits of Modern AI
Industry News

Meituan LongCat Open Sources General 365: A New Benchmark Revealing the Reasoning Limits of Modern AI

The Meituan LongCat team has officially released General 365, a new open-source benchmark designed to evaluate the reasoning capabilities of large language models (LLMs). In an initial assessment of 26 mainstream models, the results highlight a significant gap in current AI reasoning performance. Gemini 3 Pro, currently regarded as one of the most powerful models globally, achieved an accuracy rate of only 62.8%. Furthermore, the vast majority of the models tested failed to reach the 60% threshold, which is traditionally considered a passing grade. This release by Meituan's technical team sets a rigorous new standard for the industry, emphasizing that complex reasoning remains a formidable challenge even for the most advanced artificial intelligence systems.

Personal AI Infrastructure: A New Framework for Agentic AI Designed to Enhance Human Capabilities
Industry News

Personal AI Infrastructure: A New Framework for Agentic AI Designed to Enhance Human Capabilities

Daniel Miessler has introduced a new project titled "Personal AI Infrastructure," which is currently gaining traction on GitHub. The project is defined as an agentic AI infrastructure specifically designed to augment and enhance human capabilities. Unlike traditional AI tools that function as isolated applications, this initiative focuses on building the foundational infrastructure required to support autonomous agents that work on behalf of the individual. The core philosophy of the project centers on the shift from AI as a simple conversational interface to a robust, integrated system that serves as an extension of the user. By prioritizing the enhancement of human potential through structured agentic frameworks, the project aims to redefine how individuals interact with and leverage artificial intelligence in their daily lives and professional workflows.