Back to List
Meituan Technical Team Releases LARYBench: A New Benchmark for Universal Latent Action Representation in Embodied AI
Industry NewsEmbodied AILARYBenchComputer Vision

Meituan Technical Team Releases LARYBench: A New Benchmark for Universal Latent Action Representation in Embodied AI

The Meituan Technical Team has officially introduced LARYBench (Latent Action Representation Yielding Benchmark), a systematic evaluation framework designed to guide the learning of universal latent action representations from large-scale visual data. This benchmark marks a significant milestone in embodied AI by providing a standardized way to measure how models learn actions from visual inputs. Experimental results from the benchmark reveal that general vision models significantly outperform specialized embodied action expert models in both action generalization and control precision. Furthermore, the research demonstrates that embodied action representations can naturally emerge from large-scale human video data, suggesting that broad visual training is a viable path toward achieving more sophisticated and adaptable robotic control systems.

美团技术团队

Key Takeaways

  • LARYBench Introduction: Meituan has launched the Latent Action Representation Yielding Benchmark (LARYBench) to evaluate universal latent action representations learned from visual data.
  • Superiority of General Models: Experimental data indicates that general-purpose vision models outperform specialized embodied AI models in action generalization and control precision.
  • Emergent Capabilities: Embodied action representations can emerge effectively from large-scale human video datasets, rather than requiring exclusively robot-specific data.
  • New Standard for Embodied AI: LARYBench serves as a systematic guide for developing models that can translate visual information into actionable robotic intelligence.

In-Depth Analysis

Defining the ImageNet for Embodied Action

The release of LARYBench by the Meituan Technical Team represents a strategic shift in how the industry evaluates embodied intelligence. By positioning LARYBench as a systematic evaluation benchmark, the team aims to provide a framework similar to what ImageNet did for computer vision. The core focus of this benchmark is "Latent Action Representation," which refers to the underlying mathematical understanding a model has regarding physical movements and actions. By learning these representations from large-scale visual data, AI systems can potentially bridge the gap between seeing an action and performing it.

General Vision Models vs. Specialized Experts

One of the most striking findings revealed by the LARYBench experiments is the performance gap between general vision models and specialized embodied action models. Traditionally, the industry has leaned toward developing "expert" models specifically trained on robotic datasets to handle control tasks. However, the LARYBench results show that general vision models—those trained on vast, diverse visual datasets—exhibit significantly better action generalization and control precision. This suggests that the broad features learned by general-purpose models provide a more robust foundation for embodied tasks than the narrow focus of specialized models.

The Role of Human Video Data in Action Learning

LARYBench highlights a critical breakthrough in data sourcing for embodied AI: the emergence of action representations from human video data. The benchmark demonstrates that models do not necessarily need to be trained solely on teleoperated robot data or simulated environments to understand movement. Instead, by processing large-scale videos of humans performing various tasks, these models can develop an implicit understanding of actions. This "emergence" of embodied representation from human-centric data opens new doors for scaling AI training, as human video data is far more abundant and easier to collect than specialized robotic execution data.

Industry Impact

The introduction of LARYBench is poised to influence the AI industry in several key ways. First, it provides a clear metric for researchers to measure the "generalization" capabilities of their models, which has long been a hurdle in robotics. Second, the discovery that general vision models are superior to specialized ones may lead to a consolidation of research efforts, where developers focus on fine-tuning large-scale foundation models for embodied tasks rather than building niche models from scratch. Finally, the validation of human video data as a primary training source could accelerate the development of humanoid robots and autonomous systems by leveraging the vast amount of video content already available on the internet.

Frequently Asked Questions

Question: What is the primary purpose of LARYBench?

LARYBench (Latent Action Representation Yielding Benchmark) is designed to be a systematic evaluation system that guides and measures how AI models learn universal latent action representations from large-scale visual data.

Question: Why are general vision models performing better than specialized models in this benchmark?

According to the experimental results, general vision models demonstrate superior action generalization and control precision. This suggests that the diverse features and broad patterns learned by general models are more effective for embodied intelligence than the narrow training of specialized action expert models.

Question: Can robots learn how to move just by watching videos of humans?

Yes, the LARYBench findings indicate that embodied action representations can emerge from large-scale human video data. This means that models can learn the underlying logic of actions and movements by observing human behavior, which can then be applied to robotic control.

Related News

Meituan LongCat Releases General 365: A Challenging New Benchmark for AI Reasoning Evaluation
Industry News

Meituan LongCat Releases General 365: A Challenging New Benchmark for AI Reasoning Evaluation

Meituan's LongCat team has officially open-sourced General 365, a new evaluation benchmark designed to measure the reasoning capabilities of large language models (LLMs). In a comprehensive test involving 26 mainstream models, the results revealed a significant gap in current AI reasoning performance. Even the top-performing model, Gemini 3 Pro, achieved an accuracy of only 62.8%, while the vast majority of tested models failed to reach the 60% passing mark. This release aims to establish a more rigorous standard for the industry, highlighting the current limitations of even the most advanced AI systems in complex reasoning tasks. By providing a transparent and difficult metric, Meituan seeks to drive the development of more logically capable artificial intelligence.

Managing AI Coding with Agent Evaluation Thinking: Meituan's Practice in Refactoring 310,000 Lines of Code
Industry News

Managing AI Coding with Agent Evaluation Thinking: Meituan's Practice in Refactoring 310,000 Lines of Code

As AI-generated code now accounts for over 90% of development in certain environments, the primary challenge has shifted from generation speed to the effective management and constraint of AI capabilities. Meituan's technical team recently shared their experience refactoring 310,000 lines of code using a strategy centered on "Agent evaluation thinking." By implementing technical debt assessment, standardized rules, a specialized Refactoring SOP, and a Pre-PR (Pull Request) mechanism, they have successfully transformed large-scale refactoring from a high-cost, periodic project into a continuous, daily operational task. This approach ensures that AI-driven development does not amplify systemic chaos but instead adheres to unified technical standards, maintaining long-term code quality and system stability in an AI-dominated coding era.

Industry News

US Government Grants Anthropic Permission to Release Mythos Model to Selected Trusted Partners

In a significant development for the artificial intelligence sector, the United States government has officially authorized Anthropic to release its latest AI model, known as 'Mythos,' to a restricted group of 'trusted partners.' This decision, reported on June 26, 2026, underscores a growing trend of federal oversight in the deployment of high-capability AI systems. By limiting the initial rollout to specific entities, the move aims to balance the rapid pace of technological innovation with rigorous safety and security protocols. While the specific technical specifications of Mythos have not been publicly detailed, the requirement for government clearance suggests that the model possesses advanced capabilities that fall under current regulatory scrutiny. This event marks a pivotal moment in the relationship between AI developers and national regulators, establishing a framework for the controlled release of sensitive technology.