
Meituan LongCat Team Unveils WBench: A Systematic Multi-Round Evaluation Benchmark for Interactive Video World Models
The Meituan LongCat team has introduced WBench, the first systematic multi-round evaluation benchmark specifically designed for interactive video world models. Functioning as a diagnostic "CT scanner," WBench is engineered to identify the specific technical bottlenecks that occur as AI models transition from passive video observation to active, multi-round interaction. By evaluating models across diverse scenarios—ranging from lunar explorations to futuristic cyber cities—the benchmark provides a structured framework to assess how well these systems handle complex, interactive environments. This open-source tool marks a significant advancement in AI research, offering a standardized method to measure the boundaries of current world models and their ability to maintain consistency through iterative engagement.




















