OpenAI Voice Engine AI Model Makes Voice Clone in 15 Seconds
OpenAI Develops a new AI Voice Engine model that is capable of creating synthetic voices or voice clones using only 15 second samples. The main advantage of Voice Engine is that it is faster compared to other AI voice cloning models, which require more than one minute of sampling.
Although not yet officially launched, Voice Engine is being tested on a small scale on several platforms, including Age of Learning, an educational platform. With only a 15 second sample, Voice Engine is able to produce a sound that is very similar to the original sound.
Voice Engine development began with OpenAI in late 2022, and the model already supports voice presets for the text-to-speech API and ChatGPT's Read Aloud feature. According to members of the OpenAI product team, the Voice Engine was trained using a combination of licensed data and will be available to general users.
While OpenAI says that Voice Engine will only be available to around 10 developers, further details about who the developers will be and when it will launch publicly are being kept under wraps. Voice Engine looks more powerful than other AI text-to-audio platforms like ElevenLabs or Podcastle.
Previously, OpenAI also announced a text-to-video AI model called Sora, which is capable of creating realistic videos using only text. The latest news states that Sora is scheduled to be launched to the public at the end of this year.
Make a Movie
OpenAI encourages artificial intelligence (AI) models. Sora's AI is involved in filmmaking in Hollywood. AI Sora has great capabilities that can create videos based on text commands. Even though it is relatively new, Sora can create long videos with high quality, and has the ability to produce complex videos with various characters, movements, styles and accurate subject details.
AI Sora can create complex videos with a certain number of characters, certain types of movement, certain styles (animation, photorealistic video, black and white, etc.), as well as accurate subject and background details. Currently the CEO and COO of OpenAI are holding discussions with related parties in the Hollywood film industry. Several well-known directors and actors have had access to Sora, as quoted by Mashable.
However, OpenAI's efforts have met with opposition from several parties in the film industry, including the Hollywood writers' and actors' union, who are concerned that the use of AI in content creation could replace the roles of human writers and actors. Because, the content created by AI Sora can be used forever without royalties.
Although there have been positive responses from a number of testers regarding Sora's capabilities, negative responses have also emerged on social media regarding concerns about the impact of the use of AI in this industry.
In an interview with the Wall Street Journal, OpenAI's Chief Technology Officer, Mira Murati, revealed that Sora will be available in the next few months or the second quarter of this year or the end of this year.
Although Sora's operational costs are reportedly higher than other AI models, OpenAI is committed to making it available at the same price as Dall-E, an AI image generator tool. As with Dall-E, Sora will likely be available for free with certain limitations, with the option of a paid version for maximum features and performance.
Sora is not only able to produce realistic videos based on text prompts, but can also create videos from images, edit videos, and extend videos by adding or filling frames.
Able to Make Cool Films
OpenAI continues to innovate in the field of artificial intelligence (AI). Most recently, OpenAI introduced the Sora AI model that allows users to create videos based on simple text requests. Of course, OpenAI will continue to improve Sora's ability to process videos, including making real and impressive films, no less than human-made films.
One striking example of Sora's work is a museum tour video with stunning details of paintings and sculptures. Sora is able to create videos with complex storylines, such as the story of aliens disguised in New York City as quoted by Tech Radar.
Sora's video is able to display realistic and imaginative camera movements that amaze the audience. Then, Sora's short video was edited by content creator Blaine Brown, who added a scene of the alien rapping with perfect lip sync.
From the results of this short video, everyone is convinced that Sora shows great potential in the film industry. However, there are also concerns that Sora could replace the role of humans in content creation, so it needs to be seriously considered by filmmakers and content creators.
Sora AI abilities
OpenAI launched Sora, an artificial intelligence (AI) model that can convert text into video. Sora allows users to create photorealistic videos of up to one minute based on written commands. The AI model will help groups or visual artists, designers and filmmakers.
“This is Sora, our video creation model. truly extraordinary. Awesome work from them and the team. What an extraordinary moment," wrote OpenAI CEO Sam Altman on the X account.
Amazingly, Sora can create complex scenes with various characters, specific movements, and accurate background details. The AI model can also understand objects in physical context and produce characters with vivid emotional expressions.
With Sora, users can produce videos up to a minute long, incorporating detailed scenes and multiple characters. The announcement includes a video clip that follows an SUV along a winding mountain road and "historical" footage of California during the gold rush era.
Sora can produce videos from still images, expanding or filling in missing frames in existing videos. While previously AI models like Midjourney took the lead in text to image transformation, companies like Runway, Pika, and Google's Lumiere have also shown significant progress in this domain. Similar to Sora, Lumiere provides users with text-to-video tools and also allows them to create videos from still images.
OpenAI asks video creators using Sora to label videos, with C2PA guidelines. Sora also uses existing security methods implemented in DALL-E that reject inappropriate or malicious text commands.
Currently, Sora is still being evaluated by "red teamers" who evaluate potential risks. OpenAI also provides access to a number of visual artists, designers and filmmakers to provide feedback as quoted by The Verge.
OpenAI acknowledges that existing AI models may not always accurately simulate the physics of complex scenes and may lack understanding of some aspects of cause and effect. OpenAI recently added a watermark to their text-to-image tool, DALL-E 3 but the watermark is very easy to remove. Like other AI products, OpenAI will have to face the consequences of the possible spread of fake photorealistic AI videos that are mistaken for the real thing.
Lastly, OpenAI said it will engage with policymakers, educators, and artists around the world to understand their concerns and identify positive cases of use of this new technology.
OpenAI has been developing generative AI tools at a rapid pace since ChatGPT launched in November 2022. Since then, we have seen the release of GPT-4, voice and image commands, the new DALL-E 3 image model, all available via GPT Chat. OpenAI APIs are also having an exponential impact on the AI industry enabling companies and developers to create their own generative AI tools. Now, OpenAI is taking the next big step in advancing AI's capabilities in video creation.