Back to List
TechnologyAIInnovationDeepMind

Google DeepMind Unveils SIMA 2: A General-Purpose AI Agent Powered by Gemini, Achieving Near-Human Performance in Complex 3D Virtual Worlds with Enhanced Reasoning and Self-Improvement

Google DeepMind has launched SIMA 2, an upgraded general-purpose AI agent designed to navigate and perform tasks in complex 3D game environments. Building on its predecessor, SIMA 1 (released in 2024), SIMA 2 integrates the Gemini 2.5 Flash Lite model as its core reasoning engine, enabling it to better understand goals, interpret plans, and continuously learn through self-improvement. While SIMA 1 achieved a 31% task completion rate with over 600 language instructions, SIMA 2 significantly boosts this to 62%, nearing the 71% completion rate of human players. SIMA 2 maintains the same interface but transforms from a mere instruction executor into an interactive game partner, capable of explaining its intentions and answering questions about its goals. It also expands its instruction channels to include voice, graphics, and emojis, demonstrating advanced reasoning by interpreting abstract requests. Furthermore, SIMA 2 features a self-improvement mechanism where it learns from its own experience in new games, with the Gemini model generating and scoring new tasks, leading to success in previously failed scenarios without additional human demonstrations. DeepMind also showcased SIMA 2's integration with Genie 3, allowing it to generate interactive 3D environments from a single image or text prompt, marking a significant step towards advanced real-world robotics.

AI新闻资讯 - AI Base

Google DeepMind has recently unveiled SIMA 2, an advanced general-purpose AI agent engineered to excel in complex 3D game worlds. SIMA 2, which stands for Scalable, Instructable, Multiworld Agent, represents a significant upgrade from its predecessor, SIMA 1, introduced in 2024. The new iteration leverages the powerful Gemini model, specifically Gemini 2.5 Flash Lite, as its core reasoning engine, enabling enhanced goal comprehension, plan interpretation, and continuous self-improvement across diverse virtual environments.

SIMA 1, upon its release, operated by interpreting over 600 language instructions and controlling virtual environments via rendered images and virtual keyboard/mouse inputs. It achieved a task completion rate of approximately 31%, notably lower than the 71% completion rate observed in human players. SIMA 2, while retaining the same interface, has dramatically improved its performance. DeepMind's evaluations show SIMA 2's task completion rate soaring to 62%, bringing it remarkably close to human player levels.

A key architectural enhancement in SIMA 2 is the deep integration of the Gemini model. This allows the agent to receive visual observations and user instructions, deduce high-level objectives, and generate corresponding actions. This novel training paradigm transforms SIMA 2 from a simple instruction executor into an interactive game partner. It can now explain its intentions, respond to queries about its current goals, and articulate its reasoning process within the environment.

SIMA 2 also boasts expanded instruction channels, moving beyond mere text commands to process voice, graphics, and even emojis. A compelling demonstration involved a user asking SIMA 2 to locate a "house the color of a ripe tomato." The agent successfully reasoned that "a ripe tomato is red" and subsequently identified the target, showcasing its advanced inferential capabilities.

Self-improvement is another standout feature of SIMA 2. After an initial phase of learning from human game demonstrations, the agent transitions into new games, relying entirely on its own experience for learning. The Gemini model plays a crucial role here, generating new tasks for the agent and scoring its performance. This mechanism has led to subsequent versions of SIMA 2 successfully completing many tasks that it previously failed, all without the need for additional human demonstrations.

Finally, DeepMind showcased the synergy between SIMA 2 and Genie 3. This integration allows for the generation of interactive 3D environments from a single image or text prompt. Within these newly generated environments, SIMA 2 demonstrated its ability to identify objects and accomplish specified tasks, marking a pivotal step towards the development of general-purpose agents for more advanced real-world robotic applications.

Related News

Technology

Google Unveils Antigravity: A New AI-Powered Autonomous Platform for End-to-End Software Development, Integrating with Gemini 3 for Agentic Coding

Google has launched Antigravity, a novel platform designed for "AI agent-led development," moving beyond traditional IDEs. This autonomous agent collaboration system enables AI to independently plan, execute, and verify complete software development tasks. Deeply integrated with the Gemini 3 model, Antigravity represents Google's key product in "Agentic Coding." It addresses limitations of previous AI tools, which were primarily assistive and required manual operation and step-by-step human prompts. Antigravity allows AI to work across editors, terminals, and browsers, plan complex multi-step tasks, automatically execute actions via tool calls, and self-check results. It shifts the development paradigm from human-operated tools to AI-operated tools with human supervision and collaboration. The platform's core philosophy revolves around Trust, Autonomy, Feedback, and Self-Improvement, providing transparency into AI's decision-making, enabling autonomous cross-environment operations, facilitating real-time human feedback, and allowing AI to learn from past experiences.

Technology

Google Vids Unlocks Advanced AI Features for All Gmail Users: Free Access to AI Voiceovers, Redundancy Removal, and Image Editing

Google has made several advanced AI features in its Vids video editing platform available to all users with a Gmail account, previously exclusive to paid subscribers. These newly accessible tools include AI voiceovers, automatic removal of redundant speech, and AI image editing. The transcription trimming feature automatically eliminates filler words like "um" and "ah," along with long pauses, significantly enhancing video quality. Users can also generate professional-grade voiceovers from text scripts, choosing from seven different voice options, many of which sound natural. Additionally, the AI image editing tool allows for easy modifications such as background removal, descriptive editing, and transforming static photos into dynamic videos. Google aims to empower both beginners and experienced creators to produce high-quality video content, anticipating significant growth in the video editing market despite Vids being in its early stages.

Technology

Quora's Poe AI Platform Launches Group Chat Feature Supporting Up to 200 Users for Enhanced Collaborative AI Interactions

Quora has introduced a new group chat feature for its AI platform, Poe, allowing up to 200 users to collaborate with various AI models and bots in a single conversation. This innovation supports multi-modal interactions including text, image, video, and audio generation. The launch coincides with OpenAI's ChatGPT piloting similar group chat functionalities in select markets, signaling a shift in AI interaction methods. Quora highlights that this feature will offer new interactive experiences for AI users, such as family trip planning using Gemini 2.5 and o3Deep Research, or team brainstorming with image models to create mood boards. Users can also engage in intellectual games with Q&A bots. Group chats can be created from Poe's homepage, with real-time synchronization across devices, ensuring seamless transitions between desktop and mobile. Quora developed this feature over six months and plans to optimize it based on user feedback, emphasizing the unexplored potential for group interaction and collaboration in AI mediums. Poe also enables users to create and share custom bots.