Nvidia reportedly downloaded a plethora of videos from YouTube, Netflix, and other services to train data for its AI products.
This information comes from a report by 404 Media with the media outlet able to obtain documents and chats, which included instructions to employees to scrape videos from Netflix, YouTube, and other sources.
The report states that Nvidia was making use of the downloaded videos to train AI models for services like the company’s Omniverse 3D world generator and “digital human” efforts like the embodied AI GR00T project.
According to the report, employees who raised ethical and legal concerns were informed that the practice was approved by the “highest levels of the company.”
Additionally, the scraping of videos was not limited to YouTube and Netflix. Nvidia reportedly downloaded videos from services including MovieNet, libraries of video game footage, and the GitHub video dataset WebVid.
So Nvidia downloaded 38.5 million videos from YouTube & Netflix to train their AI models.
Nvidia reportedly downloaded content using virtual machines with rotating IP addresses to avoid bans & detection from YouTube.
Meanwhile, YouTube CEO @nealmohan has said that using YouTube… https://t.co/FpKxcD3VjF
— Brandon Butch (@BrandonButch) August 6, 2024
As per the report, some of the videos that were used by Nvidia were sourced from a large library of YouTube videos designated for academic purposes only. Nvidia reportedly claimed that the videos, which were a part of the academic library, were fair game for commercial AI services. HD-VG-130M, a library of 130 million YouTube videos, includes a usage license specifying that it is only meant for academic research. To evade detection by YouTube, Nvidia reportedly downloaded content using virtual machines (VMs) with rotating IP addresses to avoid bans.