Back to List
ProductAIInnovationMultimodal AI

ElevenLabs Unveils Image & Video (Beta): An All-in-One AI Content Creation Platform for Visuals, Audio, and Music Generation

ElevenLabs has officially launched Image & Video (Beta), a comprehensive AI content creation platform designed for creators and marketers. This integrated platform combines image, video, sound, music, and sound effect generation capabilities. It leverages leading multimodal generative models such as Veo, Kling, and Sora to enable rapid visual content creation. Users can directly synthesize voices, overlay narrations, and edit soundtracks within the ElevenLabs platform, producing commercial and creative video content. The platform supports a streamlined workflow, including image/video generation, audio/voiceover addition with lip-sync, background music and sound effect editing, multi-segment synthesis, and ultra-resolution enhancement via Topaz integration. It aims to provide a unified creative environment, eliminating the need for multiple tools and catering to content creators, marketing teams, educators, and game developers.

Xiaohu.AI 日报

ElevenLabs has officially introduced Image & Video (Beta), an all-encompassing AI content creation platform tailored for creators and marketers. This innovative platform integrates image, video, sound, music, and sound effect generation into a single, cohesive environment. It facilitates the rapid creation of visual content by incorporating top-tier multimodal generative models, including Veo, Kling, Sora, Wan, Seedance, Nanobanana, Flux Kontext, and Seedream.

Within the ElevenLabs platform, users can directly perform voice synthesis, overlay narrations, and edit soundtracks, ultimately producing video content suitable for both commercial and creative applications. The platform is designed to streamline the entire content creation workflow, allowing users to complete various tasks without switching between different applications.

Key functionalities available within Image & Video (Beta) include:

* <b>Image & Video Generation:</b> Utilizes world-leading models such as Veo, Sora, Kling, Wan, Seedance, Nanobanana, Flux Kontext, and Seedream. This feature is ideal for creating short advertisements, animated storyboards, cover thumbnails, and brand videos. The combination of multiple models allows for exploration of different styles and creative requirements.

* <b>Audio Creation & Overlay:</b> Audio can be imported into ElevenLabs Studio for synthesis and soundtracking. Users can select from ElevenLabs' provided sound library or use their own cloned voices. The system supports overlaying sound effects and background music to meet film-grade content demands.

* <b>Lip-Sync & Voice Replacement:</b> The system enables precise lip synchronization between synthesized speech and generated video. It also allows for voice replacement in existing videos, facilitating multi-language distribution or character voice changes.

* <b>Storyboard & Asset Generation:</b> Users can create static images for storyboards, video script planning, and brand elements. Images can be quickly refined and exported as asset packages for post-synthesis.

* <b>Captions & Subtitles:</b> Automatically recognizes speech and generates subtitles, supporting multiple languages and timeline synchronization.

* <b>Editing Features & Timeline Operations:</b> The Studio offers timeline editing, narration replacement, and music layering, providing a video editing software-like experience that lowers the barrier to content integration. All these operations are completed within a single platform, ensuring both efficiency and quality.

ElevenLabs' stated goal is to build a unified creative platform that integrates the industry's most advanced multimodal models with its powerful voice technology. This allows anyone to complete all steps from idea to finished product within one platform, eliminating the need to jump between multiple tools. The platform is particularly suitable for content creators, YouTubers, podcasters, brand marketing teams, advertising agencies, educational content producers, online training instructors, game developers, and animation producers.

Additional feature highlights include Topaz ultra-resolution enhancement for improving video and image clarity, Studio timeline operations for refined video editing and synthesis, and full-process voice control for integrated narration and character dialogue generation.

Users can begin experiencing ElevenLabs Image & Video (Beta) now.

Related News

Product

Xiaomi Unveils Open-Source 7B Multimodal Model MiMo-VL and AI Butler Miloco for Automated Smart Home Control

Xiaomi has launched its 7B parameter multimodal large model, 'Xiaomi-MiMo-VL-Miloco-7B-GGUF,' on Hugging Face and GitHub, alongside an AI butler named 'Xiaomi Miloco.' This system leverages Mijia cameras to identify user activities like gaming, fitness, or reading, and gestures such as victory signs or thumbs-up. Miloco then automatically controls smart home devices including lights, air conditioners, and music, while also supporting the Home Assistant protocol. Operating under a non-commercial open-source license, Miloco can be deployed with a single click on Windows or Linux hosts equipped with NVIDIA GPUs and Docker. Examples include automatic desk lamp activation for reading, climate control adjustments based on bedding during sleep, and personalized voice comments upon entry based on clothing style. Xiaomi has released the model weights and inference code but retains intellectual property, prohibiting commercial use.

Product

Elon Musk Announces Easy Voice Style and Speed Customization for Grok's Voice Mode, Featuring Six Distinct Personalities

Elon Musk has revealed that Grok's Voice Mode offers users the ability to easily change voice styles and speeds. The feature includes six distinct voice options: Ara (Upbeat Female), Eve (Soothing Female), Leo (British Male), Rex (Calm Male), Sal (Smooth Male), and Gork (Lazy Male). Users can access these settings by tapping the settings icon within Grok's Voice Mode. Additionally, the speed of the chosen voice can also be adjusted, enhancing user customization.

Product

NotebookLM Reaches Milestone: Now Supports Image Data Sources for Enhanced Information Retrieval

NotebookLM has achieved a significant milestone by integrating support for image data sources. This new capability allows users to upload and retrieve information from various image types, including classroom whiteboard notes, textbook content, tables, and even impromptu street photographs. This feature is anticipated to be particularly beneficial for students and individuals attending lectures, offering a versatile new way to manage and access visual information.