Nvidia Accused of Scraping YouTube and Netflix Videos for AI Training

Why Trust Techopedia
Key Takeaways

  • Nvidia reportedly downloaded a plethora of videos from YouTube, Netflix, and other services to train data for its AI products.
  • The report states that Nvidia was making use of the downloaded videos to train AI models for its services.
  • Employees who raised ethical and legal concerns were informed that the practice was approved by the "highest levels of the company."

Nvidia reportedly downloaded a plethora of videos from YouTube, Netflix, and other services to train data for its AI products.

This information comes from a report by 404 Media with the media outlet able to obtain documents and chats, which included instructions to employees to scrape videos from Netflix, YouTube, and other sources.

The report states that Nvidia was making use of the downloaded videos to train AI models for services like the company’s Omniverse 3D world generator and “digital human” efforts like the embodied AI GR00T project.

According to the report, employees who raised ethical and legal concerns were informed that the practice was approved by the “highest levels of the company.” 

Additionally, the scraping of videos was not limited to YouTube and Netflix. Nvidia reportedly downloaded videos from services including MovieNet, libraries of video game footage, and the GitHub video dataset WebVid. 

As per the report, some of the videos that were used by Nvidia were sourced from a large library of YouTube videos designated for academic purposes only. Nvidia reportedly claimed that the videos, which were a part of the academic library, were fair game for commercial AI services. HD-VG-130M, a library of 130 million YouTube videos, includes a usage license specifying that it is only meant for academic research. To evade detection by YouTube, Nvidia reportedly downloaded content using virtual machines (VMs) with rotating IP addresses to avoid bans.