← All articles

Nvidia can't shake authors' claims it trained AI on pirated books

The case could reshape how artificial intelligence companies are allowed to acquire the massive datasets they need to build their systems.

By Jeremy YurowNorthern District of CaliforniaMay 5, 2026
nvidia-cant-shake-authors-claims-it-trained-ai-on-pirated-books

(CN) — When novelists Brian Keene, Abdi Nazemian and Stewart O'Nan sued Nvidia more than two years ago, they accused the company of training its artificial intelligence systems on pirated copies of their books.

On Tuesday, a federal judge largely agreed they have a case.

U.S. District Judge Jon Tigar denied most of Nvidia's motion to dismiss a proposed class action in the Northern District of California, allowing claims for direct and contributory copyright infringement to proceed.

The authors say Nvidia trained several of its large language models on datasets and so-called shadow libraries — online repositories hosting pirated books and other copyrighted works — that contained their copyrighted books without permission.

A key focus of the lawsuit is a dataset known as "The Pile," which included a subcollection of nearly 200,000 pirated books called Books3, itself sourced from the shadow library Bibliotik. The authors say Nvidia used The Pile to train multiple models in its Megatron line, including Megatron 345M, NeMo GPT-3 10B, InstructRetro-48B, Retro-48B and Nemotron-4 15B.

Nvidia pushed back. The company asked the court to take judicial notice of a screenshot from its own website suggesting Megatron 345M was trained only on portions of The Pile that did not include Books3.

Tiger, a Barack Obama appointee, was not convinced. He warned that considering outside documents at the pleadings stage could lead courts to dismiss potentially valid claims before plaintiffs have a chance to obtain evidence through discovery and declined to consider the company's model card.

Without considering the model card, Tigar found the authors plausibly tied their works to the training data. Books3 made up 12% of The Pile, the authors' works appeared in Books3, and Megatron 345M was trained on The Pile.

The contributory infringement claim proved equally durable.

The authors claimed Nvidia provided customers, including Writer, Persimmon AI Labs and Amazon, with scripts specifically designed to automatically download and preprocess The Pile for use in their own AI development.

Nvidia countered its broader NeMo Megatron Framework had substantial non-infringing uses, and the company never marketed it as a tool for copyright infringement.

Tigar drew a sharp distinction. The question was not whether the platform as a whole could be used legitimately, but whether the specific scripts had any other purpose.

"The scripts are alleged to have no other purpose than to speed up the process of infringement," he wrote.

On the question of whether Nvidia knew what its customers were doing with those tools, Tigar again sided with the authors. Their complaint did not rest on suspicion, but identified concrete instances of infringement by named customers, the judge found.

"Plaintiffs have alleged that NVIDIA knew that its scripts and other assistance were directly contributing to infringement by third parties," he wrote.

One claim that did not survive was vicarious infringement, which requires showing the defendant had both the right to control infringing conduct and a direct financial interest in it.

The authors' claim Nvidia had the right and ability to control the direct infringements of customers was, in Tigar's view, too vague. They did not explain how Nvidia could actually exercise that control once a customer independently chose to access The Pile.

The financial benefit theory fared no better. The court found the authors failed to establish access to the infringing material served as a draw for customers.

"The central question is whether the infringing activity constitutes a draw, not just an added benefit," Tigar wrote.

The judge dismissed the claim with leave to amend within 21 days.

The plaintiffs are represented by the Joseph Saveri Law Firm, which also represents authors suing OpenAI over similar AI training data practices.

Representatives for Nvidia and the Joseph Saveri Law Firm did not immediately respond to requests for comment.

Read the full story on Courthouse News