Back to List
TechnologyAIPerformanceAPI

Kimi K2 Thinking Achieves Top Performance on Vending-Bench, Outperforming Open-Source Models with Moonshot API Integration

Kimi.ai announced that its Kimi K2 Thinking model has become the leading open-source model on the Vending-Bench benchmark. This improved performance was observed after re-running the model using Moonshot's own API, a method suggested to enhance tool calling capabilities. The re-evaluation by Andon Labs confirmed that integrating with the Moonshot API significantly boosted Kimi K2's average net worth achieved on the benchmark, solidifying its position as the top performer among open-source alternatives.

twitter-Kimi.ai

Kimi.ai has highlighted the superior performance of its Kimi K2 Thinking model, which has now been recognized as the best open-source model on the Vending-Bench benchmark. This achievement follows a re-evaluation conducted by Andon Labs. The re-run of Kimi K2 Thinking on Vending-Bench utilized Moonshot’s proprietary API, a strategy that was suggested to improve the model's performance specifically in tool calling. Andon Labs confirmed the efficacy of this approach, stating that the integration with Moonshot's API indeed led to a significant improvement. Consequently, Kimi K2 Thinking has now secured the top position among open-source models on Vending-Bench, based on the average net worth achieved. Kimi.ai encourages users to review the Kimi K2 Thinking benchmark best practices and obtain an API key via their platform.

Related News

Technology

Google Unveils Antigravity: A New AI-Powered Autonomous Platform for End-to-End Software Development, Integrating with Gemini 3 for Agentic Coding

Google has launched Antigravity, a novel platform designed for "AI agent-led development," moving beyond traditional IDEs. This autonomous agent collaboration system enables AI to independently plan, execute, and verify complete software development tasks. Deeply integrated with the Gemini 3 model, Antigravity represents Google's key product in "Agentic Coding." It addresses limitations of previous AI tools, which were primarily assistive and required manual operation and step-by-step human prompts. Antigravity allows AI to work across editors, terminals, and browsers, plan complex multi-step tasks, automatically execute actions via tool calls, and self-check results. It shifts the development paradigm from human-operated tools to AI-operated tools with human supervision and collaboration. The platform's core philosophy revolves around Trust, Autonomy, Feedback, and Self-Improvement, providing transparency into AI's decision-making, enabling autonomous cross-environment operations, facilitating real-time human feedback, and allowing AI to learn from past experiences.

Technology

Google Vids Unlocks Advanced AI Features for All Gmail Users: Free Access to AI Voiceovers, Redundancy Removal, and Image Editing

Google has made several advanced AI features in its Vids video editing platform available to all users with a Gmail account, previously exclusive to paid subscribers. These newly accessible tools include AI voiceovers, automatic removal of redundant speech, and AI image editing. The transcription trimming feature automatically eliminates filler words like "um" and "ah," along with long pauses, significantly enhancing video quality. Users can also generate professional-grade voiceovers from text scripts, choosing from seven different voice options, many of which sound natural. Additionally, the AI image editing tool allows for easy modifications such as background removal, descriptive editing, and transforming static photos into dynamic videos. Google aims to empower both beginners and experienced creators to produce high-quality video content, anticipating significant growth in the video editing market despite Vids being in its early stages.

Technology

Quora's Poe AI Platform Launches Group Chat Feature Supporting Up to 200 Users for Enhanced Collaborative AI Interactions

Quora has introduced a new group chat feature for its AI platform, Poe, allowing up to 200 users to collaborate with various AI models and bots in a single conversation. This innovation supports multi-modal interactions including text, image, video, and audio generation. The launch coincides with OpenAI's ChatGPT piloting similar group chat functionalities in select markets, signaling a shift in AI interaction methods. Quora highlights that this feature will offer new interactive experiences for AI users, such as family trip planning using Gemini 2.5 and o3Deep Research, or team brainstorming with image models to create mood boards. Users can also engage in intellectual games with Q&A bots. Group chats can be created from Poe's homepage, with real-time synchronization across devices, ensuring seamless transitions between desktop and mobile. Quora developed this feature over six months and plans to optimize it based on user feedback, emphasizing the unexplored potential for group interaction and collaboration in AI mediums. Poe also enables users to create and share custom bots.