Back to List
ProductAIInnovationMultimodal AI

ElevenLabs Unveils Image & Video (Beta): An All-in-One AI Content Creation Platform for Visuals, Audio, and Music Generation

ElevenLabs has officially launched Image & Video (Beta), a comprehensive AI content creation platform designed for creators and marketers. This integrated platform combines image, video, sound, music, and sound effect generation capabilities. It leverages leading multimodal generative models such as Veo, Kling, and Sora to enable rapid visual content creation. Users can directly synthesize voices, overlay narrations, and edit soundtracks within the ElevenLabs platform, producing commercial and creative video content. The platform supports a streamlined workflow, including image/video generation, audio/voiceover addition with lip-sync, background music and sound effect editing, multi-segment synthesis, and ultra-resolution enhancement via Topaz integration. It aims to provide a unified creative environment, eliminating the need for multiple tools and catering to content creators, marketing teams, educators, and game developers.

Xiaohu.AI 日报

ElevenLabs has officially introduced Image & Video (Beta), an all-encompassing AI content creation platform tailored for creators and marketers. This innovative platform integrates image, video, sound, music, and sound effect generation into a single, cohesive environment. It facilitates the rapid creation of visual content by incorporating top-tier multimodal generative models, including Veo, Kling, Sora, Wan, Seedance, Nanobanana, Flux Kontext, and Seedream.

Within the ElevenLabs platform, users can directly perform voice synthesis, overlay narrations, and edit soundtracks, ultimately producing video content suitable for both commercial and creative applications. The platform is designed to streamline the entire content creation workflow, allowing users to complete various tasks without switching between different applications.

Key functionalities available within Image & Video (Beta) include:

* <b>Image & Video Generation:</b> Utilizes world-leading models such as Veo, Sora, Kling, Wan, Seedance, Nanobanana, Flux Kontext, and Seedream. This feature is ideal for creating short advertisements, animated storyboards, cover thumbnails, and brand videos. The combination of multiple models allows for exploration of different styles and creative requirements.

* <b>Audio Creation & Overlay:</b> Audio can be imported into ElevenLabs Studio for synthesis and soundtracking. Users can select from ElevenLabs' provided sound library or use their own cloned voices. The system supports overlaying sound effects and background music to meet film-grade content demands.

* <b>Lip-Sync & Voice Replacement:</b> The system enables precise lip synchronization between synthesized speech and generated video. It also allows for voice replacement in existing videos, facilitating multi-language distribution or character voice changes.

* <b>Storyboard & Asset Generation:</b> Users can create static images for storyboards, video script planning, and brand elements. Images can be quickly refined and exported as asset packages for post-synthesis.

* <b>Captions & Subtitles:</b> Automatically recognizes speech and generates subtitles, supporting multiple languages and timeline synchronization.

* <b>Editing Features & Timeline Operations:</b> The Studio offers timeline editing, narration replacement, and music layering, providing a video editing software-like experience that lowers the barrier to content integration. All these operations are completed within a single platform, ensuring both efficiency and quality.

ElevenLabs' stated goal is to build a unified creative platform that integrates the industry's most advanced multimodal models with its powerful voice technology. This allows anyone to complete all steps from idea to finished product within one platform, eliminating the need to jump between multiple tools. The platform is particularly suitable for content creators, YouTubers, podcasters, brand marketing teams, advertising agencies, educational content producers, online training instructors, game developers, and animation producers.

Additional feature highlights include Topaz ultra-resolution enhancement for improving video and image clarity, Studio timeline operations for refined video editing and synthesis, and full-process voice control for integrated narration and character dialogue generation.

Users can begin experiencing ElevenLabs Image & Video (Beta) now.

Related News

Product

Manus Launches Browser Operator Chrome Extension: Transforms Any Browser into an AI-Powered Tool for Automated Tasks and Secure Access

Manus has released the Manus Browser Operator, a Chrome extension designed to convert any standard browser into an AI-enabled one. This tool automates complex browser operations, allowing access to protected websites and systems like research platforms and CRM tools without triggering additional login verifications. Currently in a phased rollout for advanced users, the extension aims to significantly boost daily work efficiency. Key features include secure local access, session reuse, and the ability to perform tasks such as data retrieval from databases (Crunchbase, PitchBook), CRM updates, and data extraction from paid platforms. The system operates with a dual-layer architecture, combining cloud-based browsing for general tasks with local browser access for authenticated systems, ensuring secure and efficient task execution. It is currently in beta for Pro, Plus, and Team users, supporting Chrome and Edge, with ongoing optimization for complex interactions.

Product

Google AI Developers Announce Immediate Availability of Gemini 3 for Builders

Google AI Developers have announced that Gemini 3 is now available for immediate use by developers. The announcement, made on November 19, 2025, encourages users to 'Start building with Gemini 3 today.' This brief update signifies the release of the new version of Gemini, making it accessible for development projects.

Product

Xiaomi Unveils Open-Source 7B Multimodal Model MiMo-VL and AI Butler Miloco for Automated Smart Home Control

Xiaomi has launched its 7B parameter multimodal large model, 'Xiaomi-MiMo-VL-Miloco-7B-GGUF,' on Hugging Face and GitHub, alongside an AI butler named 'Xiaomi Miloco.' This system leverages Mijia cameras to identify user activities like gaming, fitness, or reading, and gestures such as victory signs or thumbs-up. Miloco then automatically controls smart home devices including lights, air conditioners, and music, while also supporting the Home Assistant protocol. Operating under a non-commercial open-source license, Miloco can be deployed with a single click on Windows or Linux hosts equipped with NVIDIA GPUs and Docker. Examples include automatic desk lamp activation for reading, climate control adjustments based on bedding during sleep, and personalized voice comments upon entry based on clothing style. Xiaomi has released the model weights and inference code but retains intellectual property, prohibiting commercial use.