Back to List
Managing AI Coding with Agent Evaluation Thinking: Meituan's Practice in Refactoring 310,000 Lines of Code
Industry NewsAI CodingSoftware EngineeringMeituan

Managing AI Coding with Agent Evaluation Thinking: Meituan's Practice in Refactoring 310,000 Lines of Code

As AI-generated code now accounts for over 90% of development in certain environments, the primary challenge has shifted from generation speed to the effective management and constraint of AI capabilities. Meituan's technical team recently shared their experience refactoring 310,000 lines of code using a strategy centered on "Agent evaluation thinking." By implementing technical debt assessment, standardized rules, a specialized Refactoring SOP, and a Pre-PR (Pull Request) mechanism, they have successfully transformed large-scale refactoring from a high-cost, periodic project into a continuous, daily operational task. This approach ensures that AI-driven development does not amplify systemic chaos but instead adheres to unified technical standards, maintaining long-term code quality and system stability in an AI-dominated coding era.

美团技术团队

Key Takeaways

  • Shift in Focus: When AI generates more than 90% of code, the bottleneck is no longer how fast code is written, but how effectively AI is constrained by standards.
  • Agent Evaluation Thinking: Meituan utilizes an evaluation-centric approach to manage AI coding, ensuring that automated outputs meet specific quality benchmarks.
  • Standardized Mechanisms: The implementation of technical debt sorting, Rule construction, and a Refactoring SOP (Standard Operating Procedure) is essential for maintaining order.
  • Pre-PR Integration: A Pre-PR mechanism acts as a critical gatekeeper, allowing refactoring to become a seamless part of the daily iterative development process rather than a standalone effort.

In-Depth Analysis

The Challenge of AI-Generated Chaos

In the current landscape of software engineering, the efficiency of code generation has reached a tipping point. With AI capable of producing over 90% of a system's code, the traditional metrics of developer productivity are being redefined. However, Meituan's technical team points out a significant risk: without a unified framework and strict constraints, AI has the potential to exponentially increase technical debt and systemic chaos. The speed of AI can become a liability if the generated code lacks consistency or fails to adhere to the architectural integrity of the existing codebase. Therefore, the focus of engineering management must transition from facilitating speed to establishing robust constraints that guide AI behavior.

Implementing Agent Evaluation Thinking

To address the complexities of managing AI at scale, Meituan adopted what they term "Agent evaluation thinking." This methodology treats the AI coding assistant as an autonomous agent that requires constant evaluation and guidance. The practice involved refactoring 310,000 lines of code, a task that would be prohibitively expensive and time-consuming using traditional manual methods. By applying this new mindset, the team focused on four core pillars:

  1. Technical Debt Sorting: Identifying and categorizing existing issues to provide the AI with a clear roadmap of what needs improvement.
  2. Rule Construction: Establishing a set of non-negotiable technical standards that the AI must follow during the coding and refactoring process.
  3. Refactoring SOP: Creating a Standard Operating Procedure that defines the step-by-step interaction between human engineers and AI agents during code transformation.
  4. Pre-PR Mechanism: Introducing a verification layer before code reaches the Pull Request stage, ensuring that AI-generated refactors are validated against the established rules and logic requirements.

From Special Projects to Daily Iterations

One of the most significant outcomes of this practice is the normalization of code refactoring. Historically, refactoring hundreds of thousands of lines of code was viewed as a high-cost, high-risk "special project" that often disrupted regular feature development. By leveraging AI agents within a structured management framework, Meituan has successfully integrated refactoring into the daily workflow. This transition allows for the continuous improvement of the codebase, where technical debt is addressed incrementally during every iteration. This sustainable model ensures that the system evolves healthily alongside new feature additions, preventing the accumulation of unmanageable complexity.

Industry Impact

Meituan's approach signals a major shift in how large-scale technology companies handle the lifecycle of software. As AI agents become the primary authors of code, the role of the human software engineer is evolving into that of a "System Architect" and "AI Controller." The industry is moving toward a future where the quality of a software system is determined by the quality of the constraints and evaluation metrics placed upon AI agents.

Furthermore, this practice demonstrates that the "AI-native" development era requires more than just code completion tools; it requires a comprehensive ecosystem of SOPs and automated gatekeeping mechanisms. By proving that 310,000 lines of code can be refactored through continuous daily actions, Meituan provides a blueprint for other organizations to maintain massive codebases in the age of generative AI, potentially lowering the long-term maintenance costs of complex software systems across the tech industry.

Frequently Asked Questions

Question: Why is speed no longer the most important factor in AI coding?

When AI can generate the vast majority of a system's code, the volume of output is no longer the bottleneck. The primary risk becomes the lack of a unified standard, which can lead to "amplified chaos." Managing the quality and consistency of that output through constraints is now more critical than the speed of generation itself.

Question: What is the purpose of the Pre-PR mechanism in this context?

The Pre-PR mechanism serves as an automated or semi-automated checkpoint that evaluates AI-generated code before it is officially submitted for review. It ensures that the code adheres to the predefined "Rules" and "SOPs," catching errors or inconsistencies early and making refactoring a manageable part of the daily development cycle.

Question: How does "Agent evaluation thinking" change the refactoring process?

It shifts the process from a manual, labor-intensive task to a managed, automated workflow. Instead of humans doing all the heavy lifting, they design the rules and evaluation criteria that the AI (the Agent) must satisfy. This allows for massive tasks, such as refactoring 310,000 lines of code, to be handled continuously and systematically.

Related News

Meituan LongCat Releases General 365: A Challenging New Benchmark for AI Reasoning Evaluation
Industry News

Meituan LongCat Releases General 365: A Challenging New Benchmark for AI Reasoning Evaluation

Meituan's LongCat team has officially open-sourced General 365, a new evaluation benchmark designed to measure the reasoning capabilities of large language models (LLMs). In a comprehensive test involving 26 mainstream models, the results revealed a significant gap in current AI reasoning performance. Even the top-performing model, Gemini 3 Pro, achieved an accuracy of only 62.8%, while the vast majority of tested models failed to reach the 60% passing mark. This release aims to establish a more rigorous standard for the industry, highlighting the current limitations of even the most advanced AI systems in complex reasoning tasks. By providing a transparent and difficult metric, Meituan seeks to drive the development of more logically capable artificial intelligence.

Meituan Technical Team Releases LARYBench: A New Benchmark for Universal Latent Action Representation in Embodied AI
Industry News

Meituan Technical Team Releases LARYBench: A New Benchmark for Universal Latent Action Representation in Embodied AI

The Meituan Technical Team has officially introduced LARYBench (Latent Action Representation Yielding Benchmark), a systematic evaluation framework designed to guide the learning of universal latent action representations from large-scale visual data. This benchmark marks a significant milestone in embodied AI by providing a standardized way to measure how models learn actions from visual inputs. Experimental results from the benchmark reveal that general vision models significantly outperform specialized embodied action expert models in both action generalization and control precision. Furthermore, the research demonstrates that embodied action representations can naturally emerge from large-scale human video data, suggesting that broad visual training is a viable path toward achieving more sophisticated and adaptable robotic control systems.

Industry News

US Government Grants Anthropic Permission to Release Mythos Model to Selected Trusted Partners

In a significant development for the artificial intelligence sector, the United States government has officially authorized Anthropic to release its latest AI model, known as 'Mythos,' to a restricted group of 'trusted partners.' This decision, reported on June 26, 2026, underscores a growing trend of federal oversight in the deployment of high-capability AI systems. By limiting the initial rollout to specific entities, the move aims to balance the rapid pace of technological innovation with rigorous safety and security protocols. While the specific technical specifications of Mythos have not been publicly detailed, the requirement for government clearance suggests that the model possesses advanced capabilities that fall under current regulatory scrutiny. This event marks a pivotal moment in the relationship between AI developers and national regulators, establishing a framework for the controlled release of sensitive technology.