
Meituan Tech Team Launches LARYBench to Standardize Latent Action Representation Learning from Human Video Data
Meituan's technology team has introduced LARYBench (Latent Action Representation Yielding Benchmark), a groundbreaking system designed to evaluate how embodied AI learns action representations from large-scale visual datasets. The benchmark's initial findings indicate a paradigm shift: general-purpose vision models are demonstrating superior performance in action generalization and control precision compared to specialized expert models. Crucially, the research proves that embodied action representations can emerge naturally from human video data, providing a new pathway for developing more capable and adaptable robotic systems. By defining a metric similar to ImageNet for the field of embodied AI, LARYBench offers a systematic way to measure and improve how machines understand and execute physical actions based on visual observation.
















