OpenAI Relies on Data from YouTube to Train GPT-4 AI Model

OpenAI Relies on Data from YouTube to Train GPT-4 AI Model
Illustration of the function of the new digital memory feature in ChatGPT

OpenAI has relied on more than a million hours of YouTube videos to train their newest artificial intelligence model, GPT-4. 

Quoted from Gadgets360.com, Tuesday (9/3/2024), analysts noted that the exhaustion of traditional text resources prompted OpenAI to develop innovative solutions. 
The report revealed that the company developed an automatic speech recognition tool called Whisper to copy and use data from YouTube videos. 
However, YouTube's use of data is facing scrutiny for potentially violating platform guidelines and copyright. 

Some are concerned about the legal impact of the move, especially considering Google's ban on the use of video for apps outside its platform. 
Despite this, OpenAI went ahead with the plan and successfully transcribed more than a million hours of YouTube videos for use in GPT-4 training. 
Despite claims that OpenAI President Greg Brockman was directly involved in collecting data from the videos, the company denied these claims. 
An OpenAI spokesperson emphasized that they do not conduct unauthorized downloads of YouTube content and use a variety of sources, including public data and partnerships to train their AI models. 
They are also exploring the use of synthetic data for training future AI models. 

Post a Comment

Please Select Embedded Mode To Show The Comment System.*

Previous Post Next Post