Google's Gemini Veo 3.1 Launches 'Ingredients to Video' Mode for Pro/Ultra Subscribers: Create 8-Second 1080p Videos from Three Reference Images with Consistent Characters and SynthID Watermarks
Google has rolled out the Veo 3.1 video model to Gemini Pro/Ultra subscribers, introducing a new 'Ingredients to Video' mode. This feature allows users to upload three reference images simultaneously to extract character, scene, and style characteristics, which are then merged into an 8-second, 1080p video. The generated content includes an invisible SynthID watermark. Users can create videos via text prompts on web or mobile, with the system maintaining cross-frame character consistency and lighting coherence. Google demonstrated this by combining three selfies, a cyber city background, and an oil painting style image to produce a 'futuristic impressionist street walk' short film with no facial or clothing deformation. Veo 3.1 also outputs native environmental sound and supports first/last frame control and video extension. The multi-image reference feature is fully available, utilizing existing subscription quotas without additional payment plans announced.
Google has today pushed the Veo 3.1 video model to its Gemini Pro/Ultra subscribers, introducing a new and innovative 'Ingredients to Video' mode. This feature empowers users to upload up to three reference images concurrently. From these images, the system intelligently extracts distinct characteristics: one for the character, another for the scene, and a third for the artistic style. These extracted elements are then seamlessly fused together to generate an 8-second, 1080p video.
A key security and authenticity feature of Veo 3.1 is the automatic inclusion of an invisible SynthID watermark within all generated content. Users can initiate video creation by simply inputting a text prompt, accessible through both web and mobile interfaces. The system is designed to maintain high fidelity across frames, ensuring consistent character appearance and coherent lighting throughout the generated video.
Google provided a compelling demonstration of Veo 3.1's capabilities. By combining three selfies taken from different angles, a cyber city background image, and an oil painting style reference, the model successfully outputted a short film depicting an 'impressionist futuristic street walk.' Notably, the demonstration highlighted the model's ability to render faces and clothing without any deformation, showcasing its advanced consistency.
Beyond visual generation, Veo 3.1 also outputs native environmental sound, enhancing the immersive quality of the videos. Furthermore, the model offers functionalities for controlling the first and last frames of the video, along with a video extension feature. Google has confirmed that the multi-image reference capability is now fully rolled out to all eligible subscribers. The generation quota for this new feature aligns with existing subscription allowances, and no additional payment plans have been announced at this time.