Back to List
TechnologyAIInnovationContent Creation

ElevenLabs Unveils "Image & Video Platform": A Super AI Content Factory for Integrated Visuals, Audio, and Music Generation, Revolutionizing Content Creation Workflow

ElevenLabs, a leader in multimodal AI, has launched its new "Image & Video Platform," transforming from a voice-only tool into a comprehensive AI content factory. This platform integrates image generation, video generation, voice synthesis, music creation, and sound effect design, enabling creators and marketers to produce commercial-grade videos from script to final product within a single interface. It eliminates the need for switching between multiple platforms by seamlessly combining visual generation with ElevenLabs' audio capabilities. The platform incorporates top multimodal models like Google Veo, OpenAI Sora, and Kling, alongside ElevenLabs' proprietary AI voice and music generation. Designed for commercial use, it supports various aspect ratios, includes a commercial-safe audio library, offers multi-language narration replacement, and features a timeline editor for precise synchronization. Official demonstrations show a 30-second brand advertisement can be created in just five minutes, significantly boosting content production efficiency.

AI新闻资讯 - AI Base

Multimodal AI powerhouse ElevenLabs has officially announced the launch of its new "Image & Video Platform." This marks a significant evolution from being a mere voice tool to becoming a super AI content factory that integrates image generation, video generation, sound synthesis, music creation, and sound effect design. With this update, creators and marketers can now bypass the need to switch between multiple platforms, handling everything from script to commercial-grade video production with a single click.

The new platform establishes a closed-loop workflow, allowing users to generate visuals and dynamic videos, then directly overlay professional-grade narration, background music, and environmental sound effects within the same interface. This seamless integration, from concept to deployable marketing video, is claimed by ElevenLabs to take just minutes, fundamentally redefining AI content production efficiency.

The Image & Video Platform brings together a powerful matrix of the world's leading multimodal models, including Google Veo (for ultra-long consistent videos), OpenAI Sora (for cinematic visual quality), Kling (for hyper-realistic physical motion effects), and emerging players like Nanobanana, Flux Kontext, and Seedream. These are combined with ElevenLabs' self-developed, globally recognized natural AI voice and its latest music generation models. Users can freely mix and match these "strongest visuals" with "strongest audio" to achieve results far superior to those from piecing together single models.

Specifically designed for commercial applications, the platform is deeply optimized for creators and marketers. It supports direct output in various aspect ratios, including vertical and horizontal, making it suitable for platforms like Douyin, Xiaohongshu, TikTok, and YouTube. It also features a built-in library of commercially safe voices and music, ensuring generated content can be directly used for advertising. The ability to replace narration language with a single click facilitates the creation of multi-language versions, and a comprehensive timeline editor allows for frame-accurate audio-visual synchronization adjustments.

Demonstrations showcase impressive results: a 30-second brand advertisement can be produced in just five minutes. This process involves generating brand storyboard images, converting them into smooth video, adding CEO-level natural narration, overlaying emotional background music and environmental sound effects, and finally exporting a 4K commercial-ready product. The entire workflow eliminates the need to transfer files between tools like Premiere, Midjourney, Runway, or Suno.

AIbase editorial comments suggest that ElevenLabs' move significantly raises the bar for "text-to-video" capabilities, crucially addressing the complex problem of audio-visual synchronization. The convergence of leading visual and sound generation technologies is expected to usher in a new era of competitive advantage for independent creators and small to medium-sized businesses.

Related News

Technology

Google Unveils Antigravity: A New AI-Powered Autonomous Platform for End-to-End Software Development, Integrating with Gemini 3 for Agentic Coding

Google has launched Antigravity, a novel platform designed for "AI agent-led development," moving beyond traditional IDEs. This autonomous agent collaboration system enables AI to independently plan, execute, and verify complete software development tasks. Deeply integrated with the Gemini 3 model, Antigravity represents Google's key product in "Agentic Coding." It addresses limitations of previous AI tools, which were primarily assistive and required manual operation and step-by-step human prompts. Antigravity allows AI to work across editors, terminals, and browsers, plan complex multi-step tasks, automatically execute actions via tool calls, and self-check results. It shifts the development paradigm from human-operated tools to AI-operated tools with human supervision and collaboration. The platform's core philosophy revolves around Trust, Autonomy, Feedback, and Self-Improvement, providing transparency into AI's decision-making, enabling autonomous cross-environment operations, facilitating real-time human feedback, and allowing AI to learn from past experiences.

Technology

Google Vids Unlocks Advanced AI Features for All Gmail Users: Free Access to AI Voiceovers, Redundancy Removal, and Image Editing

Google has made several advanced AI features in its Vids video editing platform available to all users with a Gmail account, previously exclusive to paid subscribers. These newly accessible tools include AI voiceovers, automatic removal of redundant speech, and AI image editing. The transcription trimming feature automatically eliminates filler words like "um" and "ah," along with long pauses, significantly enhancing video quality. Users can also generate professional-grade voiceovers from text scripts, choosing from seven different voice options, many of which sound natural. Additionally, the AI image editing tool allows for easy modifications such as background removal, descriptive editing, and transforming static photos into dynamic videos. Google aims to empower both beginners and experienced creators to produce high-quality video content, anticipating significant growth in the video editing market despite Vids being in its early stages.

Technology

Quora's Poe AI Platform Launches Group Chat Feature Supporting Up to 200 Users for Enhanced Collaborative AI Interactions

Quora has introduced a new group chat feature for its AI platform, Poe, allowing up to 200 users to collaborate with various AI models and bots in a single conversation. This innovation supports multi-modal interactions including text, image, video, and audio generation. The launch coincides with OpenAI's ChatGPT piloting similar group chat functionalities in select markets, signaling a shift in AI interaction methods. Quora highlights that this feature will offer new interactive experiences for AI users, such as family trip planning using Gemini 2.5 and o3Deep Research, or team brainstorming with image models to create mood boards. Users can also engage in intellectual games with Q&A bots. Group chats can be created from Poe's homepage, with real-time synchronization across devices, ensuring seamless transitions between desktop and mobile. Quora developed this feature over six months and plans to optimize it based on user feedback, emphasizing the unexplored potential for group interaction and collaboration in AI mediums. Poe also enables users to create and share custom bots.