August 2025
The Boundaries of Fair Use in AI Training: Insights from Bartz et al. v. Anthropic PBC in the U.S.
In June 2025, the United States District Court for the Northern District of California issued a landmark order on fair use in a copyright suit brought by several authors against the AI startup Anthropic PBC (“Anthropic”)
[1]
, granting Anthropic’s motion for summary judgment in part on the issue of copyright fair use. The court denied the plaintiffs’ claims and held that the use of copyrighted books to train generative AI models constitutes copyright fair use under U.S. copyright law and therefore does not constitute infringement.
1. Background
Anthropic was co-founded in January 2021 by Dario Amodei—who previously served as Vice President of Research at OpenAI—and others. Anthropic aims to develop AI systems that do not provoke human anxiety [2] . Its flagship generative AI product, Claude, is designed to perform tasks such as coding and textual reasoning [3] .
To improve Claude’s performance, Anthropic sought to create a central library of “all the books in the world” as a training data for its Large Language Model (“LLM”). In pursuit of this goal, Anthropic obtained millions of books through both purchases of physical copies and downloads of digital files from pirate websites. The physical copies were disbound, scanned page by page, and converted into digital files to be compiled into the database. The original physical copies were then discarded. Even for books no longer used in LLM training, Anthropic continued to retain their digital versions permanently in the database.
The plaintiffs—three individual authors—alleged that Anthropic had, without authorization, reproduced, stored, and used their copyrighted books to train Claude’s LLM, thereby infringing their copyright. In response, Anthropic asserted a fair use defense and filed a motion for summary judgment with respect to this issue.
2. Court’s Analysis
The court divided the legal issue in this case into three distinct parts, analyzing each separately and reaching different conclusions:
(1) Use of Lawfully Purchased Books for LLM Training Constitutes Fair Use
The court found that although Anthropic copied the entirety of the copyrighted books, its purpose was not to disclose or substitute their original content for Claude’s users, but rather to use these books as training data for Claude’s LLM, enabling Claude to generate new content distinct from the originals. Therefore, the court held that this form of use was transformative in nature and fundamentally different from the original books’ typical expressive purpose. The court further emphasized that copyright law is intended to protect the original expression, not to suppress the creation of other works that merely resemble the original in style. Accordingly, even if Claude’s outputs may resemble the plaintiffs’ copyrighted books in style, such outputs do not usurp or diminish the market for the original works.
(2) Compiling Purchased Books into the Database Also Constitutes Fair Use
The court held that after Anthropic lawfully purchased the physical copies of the books, it obtained full ownership and the right to dispose of them. Converting the copyrighted books from physical copies to digital files served merely to reduce storage needs and improve searchability, and had no bearing on the creative expression embodied in these books. Furthermore, since Anthropic destroyed the original physical copies after digitization of the books, it did not increase the total number of copies in existence. Accordingly, the court held that this form of use was likewise transformative in nature and did not infringe the exclusive rights granted to the plaintiffs under U.S. copyright law.
(3) Compiling Books Downloaded from Pirate Websites into a Database Does Not Constitute Fair Use
The court found that Anthropic failed to demonstrate that downloading the copyrighted books from pirate websites—rather than acquiring them through purchase or other lawful means—was reasonable or necessary. Moreover, by downloading these books, which could otherwise have been lawfully obtained, Anthropic effectively usurps or diminishes the market for these books. The court held that this form of use, in itself, constituted copyright infringement, and that the subsequent use of these books to train Claude’s LLM did not change the conclusion under the doctrine of fair use.
3. Practical Insights
This order draws a clear boundary regarding the scope of copyright fair use in the context of generative AI development. The court held that using copyrighted works to train the LLM—enabling the generative AI model to learn stylistic features of the copyrighted works rather than directly disclosing their original content—is transformative in nature and therefore qualifies as copyright fair use. However, once the works are acquired from unlawful sources, such as pirate websites, subsequent use of these works for LLM training still constitutes copyright infringement.
This order is significant and provides valuable insights for enterprises: in developing generative AI, enterprises should establish comprehensive records of training data at the time of acquisition and verify the legality of each data source. These measures can substantially reduce exposure to copyright infringement liability.
[1] Andrea Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417-WHA (N.D. Cal.).
[2] Business Next, "Anthropic, Founded by OpenAI’s "Seven Traitors": What Is Its Mission and How Does Claude 3 Compare to ChatGPT?", available at: https://www.bnext.com.tw/article/78601/anthropic-startup (last updated August 29, 2025).
[3] anue News, "Anthropic Unveils New Model 'Claude 3.5 Sonnet,' Outperforming OpenAI with Greater Speed and Power" available at: https://news.cnyes.com/news/id/5608748 (last accessed August 29, 2025).
1. Background
Anthropic was co-founded in January 2021 by Dario Amodei—who previously served as Vice President of Research at OpenAI—and others. Anthropic aims to develop AI systems that do not provoke human anxiety [2] . Its flagship generative AI product, Claude, is designed to perform tasks such as coding and textual reasoning [3] .
To improve Claude’s performance, Anthropic sought to create a central library of “all the books in the world” as a training data for its Large Language Model (“LLM”). In pursuit of this goal, Anthropic obtained millions of books through both purchases of physical copies and downloads of digital files from pirate websites. The physical copies were disbound, scanned page by page, and converted into digital files to be compiled into the database. The original physical copies were then discarded. Even for books no longer used in LLM training, Anthropic continued to retain their digital versions permanently in the database.
The plaintiffs—three individual authors—alleged that Anthropic had, without authorization, reproduced, stored, and used their copyrighted books to train Claude’s LLM, thereby infringing their copyright. In response, Anthropic asserted a fair use defense and filed a motion for summary judgment with respect to this issue.
2. Court’s Analysis
The court divided the legal issue in this case into three distinct parts, analyzing each separately and reaching different conclusions:
(1) Use of Lawfully Purchased Books for LLM Training Constitutes Fair Use
The court found that although Anthropic copied the entirety of the copyrighted books, its purpose was not to disclose or substitute their original content for Claude’s users, but rather to use these books as training data for Claude’s LLM, enabling Claude to generate new content distinct from the originals. Therefore, the court held that this form of use was transformative in nature and fundamentally different from the original books’ typical expressive purpose. The court further emphasized that copyright law is intended to protect the original expression, not to suppress the creation of other works that merely resemble the original in style. Accordingly, even if Claude’s outputs may resemble the plaintiffs’ copyrighted books in style, such outputs do not usurp or diminish the market for the original works.
(2) Compiling Purchased Books into the Database Also Constitutes Fair Use
The court held that after Anthropic lawfully purchased the physical copies of the books, it obtained full ownership and the right to dispose of them. Converting the copyrighted books from physical copies to digital files served merely to reduce storage needs and improve searchability, and had no bearing on the creative expression embodied in these books. Furthermore, since Anthropic destroyed the original physical copies after digitization of the books, it did not increase the total number of copies in existence. Accordingly, the court held that this form of use was likewise transformative in nature and did not infringe the exclusive rights granted to the plaintiffs under U.S. copyright law.
(3) Compiling Books Downloaded from Pirate Websites into a Database Does Not Constitute Fair Use
The court found that Anthropic failed to demonstrate that downloading the copyrighted books from pirate websites—rather than acquiring them through purchase or other lawful means—was reasonable or necessary. Moreover, by downloading these books, which could otherwise have been lawfully obtained, Anthropic effectively usurps or diminishes the market for these books. The court held that this form of use, in itself, constituted copyright infringement, and that the subsequent use of these books to train Claude’s LLM did not change the conclusion under the doctrine of fair use.
3. Practical Insights
This order draws a clear boundary regarding the scope of copyright fair use in the context of generative AI development. The court held that using copyrighted works to train the LLM—enabling the generative AI model to learn stylistic features of the copyrighted works rather than directly disclosing their original content—is transformative in nature and therefore qualifies as copyright fair use. However, once the works are acquired from unlawful sources, such as pirate websites, subsequent use of these works for LLM training still constitutes copyright infringement.
This order is significant and provides valuable insights for enterprises: in developing generative AI, enterprises should establish comprehensive records of training data at the time of acquisition and verify the legality of each data source. These measures can substantially reduce exposure to copyright infringement liability.
[1] Andrea Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417-WHA (N.D. Cal.).
[2] Business Next, "Anthropic, Founded by OpenAI’s "Seven Traitors": What Is Its Mission and How Does Claude 3 Compare to ChatGPT?", available at: https://www.bnext.com.tw/article/78601/anthropic-startup (last updated August 29, 2025).
[3] anue News, "Anthropic Unveils New Model 'Claude 3.5 Sonnet,' Outperforming OpenAI with Greater Speed and Power" available at: https://news.cnyes.com/news/id/5608748 (last accessed August 29, 2025).