
Meituan Open-Sources LongCat-Next: A Native Multimodal Model Integrating Vision and Speech for Physical World AI
Meituan's technical team has officially released and open-sourced LongCat-Next, a native multimodal model designed to advance AI's interaction with the physical world. By treating vision and speech as native components rather than peripheral inputs, LongCat-Next aims to provide a more integrated approach to environmental perception and understanding. The release includes both the core model and its specialized discrete tokenizer, offering developers the foundational tools necessary to build AI systems that can perceive, comprehend, and act within real-world scenarios. This move highlights Meituan's commitment to fostering an open-source ecosystem for physical-world AI applications.























