Back to List
TechnologyAIInnovationVideo Editing

Google Gemini Update: New Multi-Reference Image Feature Empowers Users with Enhanced Control Over AI Video and Audio Generation

Google has rolled out an update for its Gemini application, introducing a novel method for AI video generation control. Users can now upload multiple reference images within a single video prompt, allowing the system to generate video and audio based on these images and accompanying text. This new functionality grants users more direct control over the final visual and auditory output of their videos. Previously, Google had tested this feature within its extended video AI platform, Flow, which offers higher video quotas and supports extending existing video clips and stitching multiple scenes. The Veo 3.1 version, released in mid-October, reportedly shows significant improvements in texture realism, input fidelity, and audio quality compared to Veo 3.0. This update aims to provide creators with greater flexibility and customization in AI-powered content creation.

AI新闻资讯 - AI Base

Google has recently updated its Gemini application, introducing a significant new feature that enhances user control over AI video generation. Users can now upload multiple reference images within a single video prompt. The system will then generate both video and audio content based on these uploaded images and any accompanying text, giving users more direct influence over the final appearance and sound of their videos.

This functionality was previously tested by Google within its extended video AI platform, Flow. The Flow platform not only supports extending existing video clips and stitching multiple scenes together but also provides higher video quotas compared to the Gemini application.

According to Google, the Veo 3.1 version, which was released in mid-October, demonstrates notable improvements over its predecessor, Veo 3.0. These enhancements are particularly evident in areas such as texture realism, input fidelity, and overall audio quality. This update allows users to leverage AI tools with greater flexibility, enabling them to create content that more closely aligns with their specific requirements.

The ability to upload multiple reference images means that creators can integrate more personalized elements into their video productions, thereby offering audiences a richer visual and auditory experience. In the rapidly evolving landscape of AI technology, Google's latest move underscores its ongoing commitment to innovation in the video generation domain. As user demands become increasingly diverse, the flexibility and customizability of AI tools are becoming paramount, and Gemini's new feature is expected to attract considerable attention and adoption from creators.

Related News

Technology

Qwen-Edit-2509-Multi-angle Lighting LoRA Released by Qwen for Enhanced Image Editing Capabilities

Qwen has announced the release of 'Qwen-Edit-2509-Multi-angle lighting LoRA,' a new tool designed to enhance image editing. The announcement, made via Twitter by @Qwen - Qwen, highlights the availability of this LoRA (Low-Rank Adaptation) model. Users can download 'Qwen-Edit-2509-Multi-angle lighting LoRA' from Hugging Face, with the download link provided as https://huggingface.co/dx8152/Qwen-Edit-2509-Multi-Angle-Lighting. This release is attributed to '大雄' and is associated with @Ali_TongyiLab.

Technology

Elon Musk Announces 'Just Grok 4': AI Demonstrates Emergent Intelligence by Redesigning Edison Lightbulb Filament

Elon Musk, via Twitter, announced 'This is just Grok 4,' highlighting a significant advancement in AI. The announcement follows a demonstration where Grok analyzed Thomas Edison's 1890 lightbulb patent, subsequently determining and implementing a superior filament design that successfully illuminated a light. This emergent intelligence, described as unique among current AI models, has been noted for its potential to revolutionize education and enable robots to construct.

Technology

DeepMind Unveils SIMA 2: A Gemini-Powered AI Agent Capable of Reasoning, Learning, and Playing in Diverse 3D Virtual Worlds, Advancing Towards Embodied AGI

DeepMind has launched SIMA 2, an advanced version of its Scalable Instructable Multiworld Agent, significantly evolving from its predecessor. While SIMA 1 could execute over 600 language instructions across various 3D virtual worlds by observing screens and using virtual keyboard/mouse, SIMA 2, powered by the Gemini large language model, transcends mere execution. It can now reason about user goals, explain its plans and thought processes, learn new behaviors, and generalize experiences across multiple virtual environments. This leap is driven by a Gemini-integrated core that combines language, vision, and reasoning, enabling SIMA 2 to understand high-level tasks, translate natural language into action plans, and explain its decisions in real-time. Trained through human demonstrations and AI self-supervision, SIMA 2 demonstrates remarkable cross-game generalization, applying learned concepts to new tasks and operating in previously unseen commercial open-world games. It also supports multimodal instructions and can autonomously navigate and complete tasks in dynamically generated 3D worlds, showcasing a self-improvement loop for continuous learning without human feedback. DeepMind positions SIMA 2 as a significant step towards Embodied General Intelligence.