Back to List
TechnologyAIVideoInnovation

Google's Gemini Veo 3.1 Launches 'Ingredients to Video' Mode for Pro/Ultra Subscribers: Create 8-Second 1080p Videos from Three Reference Images with Consistent Characters and SynthID Watermarks

Google has rolled out the Veo 3.1 video model to Gemini Pro/Ultra subscribers, introducing a new 'Ingredients to Video' mode. This feature allows users to upload three reference images simultaneously to extract character, scene, and style characteristics, which are then merged into an 8-second, 1080p video. The generated content includes an invisible SynthID watermark. Users can create videos via text prompts on web or mobile, with the system maintaining cross-frame character consistency and lighting coherence. Google demonstrated this by combining three selfies, a cyber city background, and an oil painting style image to produce a 'futuristic impressionist street walk' short film with no facial or clothing deformation. Veo 3.1 also outputs native environmental sound and supports first/last frame control and video extension. The multi-image reference feature is fully available, utilizing existing subscription quotas without additional payment plans announced.

AI新闻资讯 - AI Base

Google has today pushed the Veo 3.1 video model to its Gemini Pro/Ultra subscribers, introducing a new and innovative 'Ingredients to Video' mode. This feature empowers users to upload up to three reference images concurrently. From these images, the system intelligently extracts distinct characteristics: one for the character, another for the scene, and a third for the artistic style. These extracted elements are then seamlessly fused together to generate an 8-second, 1080p video.

A key security and authenticity feature of Veo 3.1 is the automatic inclusion of an invisible SynthID watermark within all generated content. Users can initiate video creation by simply inputting a text prompt, accessible through both web and mobile interfaces. The system is designed to maintain high fidelity across frames, ensuring consistent character appearance and coherent lighting throughout the generated video.

Google provided a compelling demonstration of Veo 3.1's capabilities. By combining three selfies taken from different angles, a cyber city background image, and an oil painting style reference, the model successfully outputted a short film depicting an 'impressionist futuristic street walk.' Notably, the demonstration highlighted the model's ability to render faces and clothing without any deformation, showcasing its advanced consistency.

Beyond visual generation, Veo 3.1 also outputs native environmental sound, enhancing the immersive quality of the videos. Furthermore, the model offers functionalities for controlling the first and last frames of the video, along with a video extension feature. Google has confirmed that the multi-image reference capability is now fully rolled out to all eligible subscribers. The generation quota for this new feature aligns with existing subscription allowances, and no additional payment plans have been announced at this time.

Related News

Technology

Qwen-Edit-2509-Multi-angle Lighting LoRA Released by Qwen for Enhanced Image Editing Capabilities

Qwen has announced the release of 'Qwen-Edit-2509-Multi-angle lighting LoRA,' a new tool designed to enhance image editing. The announcement, made via Twitter by @Qwen - Qwen, highlights the availability of this LoRA (Low-Rank Adaptation) model. Users can download 'Qwen-Edit-2509-Multi-angle lighting LoRA' from Hugging Face, with the download link provided as https://huggingface.co/dx8152/Qwen-Edit-2509-Multi-Angle-Lighting. This release is attributed to '大雄' and is associated with @Ali_TongyiLab.

Technology

Elon Musk Announces 'Just Grok 4': AI Demonstrates Emergent Intelligence by Redesigning Edison Lightbulb Filament

Elon Musk, via Twitter, announced 'This is just Grok 4,' highlighting a significant advancement in AI. The announcement follows a demonstration where Grok analyzed Thomas Edison's 1890 lightbulb patent, subsequently determining and implementing a superior filament design that successfully illuminated a light. This emergent intelligence, described as unique among current AI models, has been noted for its potential to revolutionize education and enable robots to construct.

Technology

DeepMind Unveils SIMA 2: A Gemini-Powered AI Agent Capable of Reasoning, Learning, and Playing in Diverse 3D Virtual Worlds, Advancing Towards Embodied AGI

DeepMind has launched SIMA 2, an advanced version of its Scalable Instructable Multiworld Agent, significantly evolving from its predecessor. While SIMA 1 could execute over 600 language instructions across various 3D virtual worlds by observing screens and using virtual keyboard/mouse, SIMA 2, powered by the Gemini large language model, transcends mere execution. It can now reason about user goals, explain its plans and thought processes, learn new behaviors, and generalize experiences across multiple virtual environments. This leap is driven by a Gemini-integrated core that combines language, vision, and reasoning, enabling SIMA 2 to understand high-level tasks, translate natural language into action plans, and explain its decisions in real-time. Trained through human demonstrations and AI self-supervision, SIMA 2 demonstrates remarkable cross-game generalization, applying learned concepts to new tasks and operating in previously unseen commercial open-world games. It also supports multimodal instructions and can autonomously navigate and complete tasks in dynamically generated 3D worlds, showcasing a self-improvement loop for continuous learning without human feedback. DeepMind positions SIMA 2 as a significant step towards Embodied General Intelligence.