When large language models such as OpenAI's GPT, Google's Gemini, and Anthropic's Claude demonstrate astonishing capabilities, a fundamental legal and economic question surfaces: do these models infringe the copyrights of their training data sources? Lawsuits are proliferating -- The New York Times suing OpenAI, Getty Images suing Stability AI, and class-action suits by authors. Yet from an economic perspective, overprotecting copyright not only fails to achieve its original policy objectives but may cause a nation to lose its competitive edge in the AI era. This article reexamines the optimal level of copyright protection in the AI age, starting from the economic essence of intellectual property.
I. The Economic Essence of Intellectual Property
Copyright Is a Means, Not an End
The intellectual property system exists not because of the moral intuition that "creators naturally own their works," but because of a clear policy objective: to encourage innovation and creation. Article I, Section 8, Clause 8 of the U.S. Constitution explicitly states that Congress has the power "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries." This language makes clear that exclusive rights are the "means," and promoting the progress of science and the arts is the "end."[1]
This utilitarian view of intellectual property stands in contrast to the continental European tradition of natural rights. Under natural rights theory, a creator's rights over their work are innate and inalienable; under the utilitarian framework, copyright is justified only insofar as it "promotes overall social welfare." When the costs of copyright protection exceed its benefits, the level of protection should be adjusted.[2]
Nobel laureate in economics Ronald Coase's transaction cost theory provides a profound framework for understanding intellectual property. Coase argued that property rights should be delineated in a way that minimizes transaction costs, thereby promoting efficient resource allocation. When intellectual property rights are delineated such that subsequent innovators must negotiate licenses with countless rights holders, transaction costs can become prohibitively high, actually impeding innovation.[3]
Optimal Protection Levels: The Economic Trade-Off
Economists William Landes and Richard Posner, in their classic work The Economic Structure of Intellectual Property Law, analyzed the costs and benefits of copyright protection. The benefit of protection is incentivizing creation -- if creators cannot obtain returns from their works, the incentive to create diminishes. The costs of protection include: restricting the dissemination and use of works, increasing the cost of subsequent creation, and generating monopoly rents.[4]
The optimal level of protection is where marginal benefit equals marginal cost. The problem is that the current copyright regime far exceeds this optimal point. In the United States, copyright protection lasts for the author's lifetime plus 70 years; in the EU it is likewise lifetime plus 70 years. This means a work by a 30-year-old creator could remain protected for over 120 years after its creation. No economic analysis supports such extended protection as having a positive effect on innovation.[5]
Economist Rufus Pollock's research estimates that from a social welfare maximization perspective, the optimal copyright term is approximately 15 years -- far shorter than the current lifetime plus 70 years. Protection beyond this term imposes marginal costs (restricting use and subsequent creation) that exceed marginal benefits (incentivizing creation), resulting in a net loss of social welfare.[6]
II. The "Tragedy of the Anticommons": The Cost of Overprotection
From the Tragedy of the Commons to the Tragedy of the Anticommons
Every economics student learns about the "tragedy of the commons": when a resource has no clearly defined property rights, it will be overused. However, economist Michael Heller introduced an equally important but less discussed concept: the "tragedy of the anticommons" -- when property rights are excessively fragmented, resources are paradoxically "underused."[7]
Imagine a river with 100 locks, each controlled by a different person. Any vessel wishing to navigate the river must negotiate with all 100 lock keepers. Even if each lock keeper's fee is perfectly reasonable, the transaction costs of negotiation would render navigation infeasible. This is the essence of the tragedy of the anticommons: too many rights holders, each with veto power, and the result is that the resource cannot be effectively utilized.
The intellectual property domain is facing a tragedy of the anticommons. A large language model's training data may come from billions of web pages, millions of books, and countless papers and articles. If authorization is required for every piece of data, the transaction costs would be astronomical. Even if it were technically possible to track the source of every piece of data and pay micro-licensing fees, the cost of negotiating with millions of rights holders would make AI development infeasible.[8]
The Cautionary Tale of the "Patent Thicket"
The tragedy of the anticommons has already produced painful lessons in the patent domain. The "patent thicket" phenomenon in biotechnology is one example. When gene sequences, protein structures, research tools, and other foundational discoveries are all patent-protected, subsequent drug development must navigate through a "thicket" of countless patents, where every step risks infringing some patent and requires license negotiations. This dramatically increases the cost and uncertainty of drug development.[9]
Research by Heller and Eisenberg demonstrated that excessive intellectual property protection can slow rather than accelerate biomedical innovation. The same logic applies to the AI domain. If every piece of training data is treated as copyrighted material requiring authorization, AI development will be mired in endless legal disputes and licensing negotiations.[10]
III. AI Training and Copyright: An Economic Analysis
The Nature of Machine Learning: Learning or Copying?
Understanding the technical nature of AI training is essential for legal and economic analysis. The training process of large language models involves having the model "read" vast amounts of text and learn statistical patterns of language -- word co-occurrences, syntactic structures, semantic relationships, and more. After training, the model does not "store" the original text; instead, it encodes the learned patterns in billions of parameters.[11]
This process is closer to "learning" than "copying." When a human writer reads a large body of literary works and develops their own writing style, we do not say they have "infringed" the copyrights of the works they read. The AI training process is fundamentally similar -- it extracts abstract patterns from large volumes of text rather than memorizing specific content.[12]
Of course, there are differences between AI training and human learning. AI can process far more data than humans; AI's "reading" involves technical copying of the original text (even if temporary); and in some cases, AI may "memorize" and output content highly similar to training data. These differences need to be carefully addressed in law, but they should not be used to broadly prohibit AI training from using copyrighted materials.
The Economic Logic of the Fair Use Doctrine
The "fair use" doctrine in U.S. copyright law is a critical mechanism for addressing copyright boundary issues. Fair use permits the unauthorized use of copyrighted materials in certain situations, including commentary, criticism, news reporting, teaching, and academic research. Fair use determination considers four factors: the purpose of use, the nature of the work, the proportion used, and the impact on the market.[13]
From an economic perspective, the fair use doctrine embodies an "optimal exception" design. It recognizes that complete copyright protection would obstruct many valuable derivative uses, and therefore permits unauthorized use where the marginal social benefit exceeds the marginal cost. Commentary, criticism, teaching, and similar uses are included in fair use precisely because the social value these uses create typically exceeds the harm to the original copyright holder.[14]
Should AI training be considered fair use? From an economic analysis perspective, the answer leans toward yes. The purpose of AI training is "transformative" -- it is not intended to substitute for the original work but to extract abstract knowledge from it. The impact of AI training on the original work's market is indirect -- no one stops buying The New York Times because GPT exists (and if they do, it is because GPT's output better meets their needs, which is the very nature of technological progress).[15]
Distinguishing "Output" from "Training"
An important legal and economic distinction is that AI's "training process" and AI's "output content" should be treated separately. The training process involves statistical learning from large volumes of data, and this process itself should not be considered copyright infringement, just as human reading does not constitute infringement. However, if an AI's output is "substantially similar" to a protected work, that specific output may constitute infringement.[16]
This distinction is reasonable. It allows AI developers to use broad training data while still protecting original creators from direct "copy-and-paste" style infringement. It also places responsibility in a more appropriate position: AI developers are responsible for designing systems that do not output content excessively similar to training data; and users who deliberately induce AI to output protected content should also bear corresponding responsibility.
IV. International Comparison: Different Policy Choices
Japan: Actively Embracing Copyright Reform for AI
Among major economies, Japan has adopted the most proactively AI-friendly copyright stance. In 2018, Japan amended its Copyright Act to explicitly provide that uses "not for the purpose of enjoying the thoughts or sentiments expressed in a work," including machine learning training, do not constitute copyright infringement. This amendment made Japan one of the most AI-training-friendly jurisdictions in the world.[17]
Japanese government policy documents explicitly state that this reform was aimed at strengthening Japan's international competitiveness in the AI field. Japan recognized that if other countries allow AI training to use protected materials while Japan prohibits it, Japan's AI industry would be at a competitive disadvantage. This represents a clear "competitiveness-oriented" intellectual property policy.
Japan's approach embodies pragmatism. Rather than getting entangled in doctrinal legal arguments about "whether training constitutes copying," it proceeds directly from policy objectives: what rules would maximize Japan's social welfare and international competitiveness? This policy thinking is worth emulating by other countries.
The EU: Text and Data Mining Exceptions
In its 2019 Directive on Copyright in the Digital Single Market, the EU created a copyright exception for "text and data mining" (TDM). Under this exception, academic research institutions may perform text and data mining on materials they have lawfully accessed, including machine learning training, without constituting copyright infringement. Additionally, commercial organizations may also conduct TDM unless the copyright holder explicitly opts out.[18]
The EU's "opt-out" mechanism attempts to strike a balance between creator rights and AI development. In theory, if a copyright holder does not wish their work to be used for AI training, they can express this through technical or legal means, and AI developers must respect that preference. However, this mechanism faces practical challenges: how can it be proven that an AI developer "knew" a particular work had opted out? How can the authorization status of every piece of data be tracked across billions of data points?[19]
The United States: Uncertainty Through Litigation
The U.S. position is currently the most unclear, being gradually clarified through judicial proceedings. Cases such as The New York Times v. OpenAI, class-action suits by authors, and Getty Images v. Stability AI will have courts determine whether AI training constitutes fair use.[20]
This approach of "clarification through litigation" has both advantages and disadvantages. The advantage is that courts can make nuanced judgments on a case-by-case basis rather than through one-size-fits-all legislation; the disadvantage is that legal uncertainty before definitive rulings suppresses investment and innovation. Some AI developers may reduce investment in the United States due to legal risk and redirect it to Japan or other regions with clearer legal environments.
From the perspective of international competition, U.S. legal uncertainty may put the country at a disadvantage in the AI race. If courts ultimately rule that AI training generally constitutes infringement, the U.S. AI industry could face massive lawsuits and licensing costs, while competitors in other countries face no such constraints.
V. Key Considerations for National Competitiveness
The Productivity Revolution of the AI Era
AI is widely regarded as the next general purpose technology (GPT) revolution following the steam engine, electricity, and the internet. The hallmark of a general purpose technology is that it transforms not just specific industries but the entire way an economy operates. The steam engine transformed manufacturing, transportation, and agriculture; electricity transformed factories, homes, and cities; the internet transformed communications, commerce, and entertainment; AI will transform every domain involving cognitive labor.[21]
Historical experience shows that in general purpose technology revolutions, nations that adopt the new technology first gain significant competitive advantages. Britain dominated the 19th century by pioneering the steam engine; the United States dominated the 20th century through early electrification and internet adoption. In the AI era, nations that can effectively develop and deploy AI technology will hold the advantage in 21st-century economic competition.[22]
The intellectual property system is a critical factor influencing AI development. If a nation's copyright regime makes AI training difficult or expensive, that nation's AI industry will fall behind others. This is not merely a loss for the AI industry itself but a delay in productivity gains across the entire economy.
The Cost of Clinging to Old Thinking
The traditional copyright framework was designed in the era of the printing press and mass media, with the core assumption that copying is scarce, controllable, and traceable. Under this framework, "copying" itself is considered the core regulated activity. However, in the digital age, copying has become ubiquitous -- browsing a webpage involves copying, using a computer involves copying, machine learning involves copying. If copyright law continues to treat "copying" as its central concept, it will be fundamentally incompatible with modern technology.[23]
A more fundamental problem is that the traditional copyright framework assumes creation is individual, discrete, and attributable. A book has one author, a song has one composer, and copyright can be clearly attributed to an individual. However, AI training breaks this assumption -- AI capabilities derive from the "collective contributions" of billions of sources, and no single source can claim "ownership" of AI's capabilities.[24]
Attempting to force the traditional copyright framework onto AI training is like trying to regulate automobiles with horse-drawn carriage laws. It would not only impede technological development but also produce absurd outcomes -- such as requiring AI developers to pay micro-licensing fees to billions of "rights holders," effectively making AI development infeasible.
The True Interests of Creators
Arguments for strict copyright protection often invoke the rhetoric of "protecting creators." However, from an economic perspective, we need to ask: does overprotection truly serve creators' interests?
First, most creators receive negligible compensation from the copyright system. In the music industry, the top 1% of artists earn the overwhelming majority of revenue; in publishing, the income gap between bestselling authors and average authors can be several thousandfold. The primary beneficiaries of the copyright system are a handful of superstars and large copyright-holding corporations (such as Disney and Universal Music), not ordinary creators.[25]
Second, AI tools themselves can help creators increase productivity. Many writers, artists, and musicians are already using AI to assist their creative process. If the copyright system impedes AI development, the victims include not only AI companies but also creators who rely on AI tools.
Third, excessive copyright protection may harm creators' "audience base." If AI tools become expensive due to licensing costs, the businesses and individuals using these tools will decrease; these businesses and individuals are the potential market for creators. A thriving AI ecosystem may create more opportunities for creators, not fewer.
VI. Policy Recommendations: Copyright Reform for the Future
Recommendation 1: Clarify the Fair Use Status of AI Training
Countries should, through legislation or judicial decisions, clearly establish that AI training -- as an act of learning rather than copying -- falls within a reasonable exception to copyright. This does not mean abolishing copyright but adjusting copyright boundaries to align with technological realities and social needs. Japan's 2018 copyright law amendment provides a reference model.[26]
Recommendation 2: Distinguish Between "Training" and "Output"
The law should distinguish between the AI training process and AI output content. The training process should be broadly permitted, as it is a transformative act of learning. However, if a specific AI output is substantially similar to a protected work, that output may still constitute infringement. This distinction is technically feasible and consistent with the fundamental copyright principle of protecting "expression" rather than "ideas."
Recommendation 3: Establish Collective Licensing and Compensation Mechanisms
For those still concerned about creator rights, collective licensing and compensation mechanisms could be considered. Similar to organizations like ASCAP and BMI in the music industry, specialized institutions could be established to negotiate on behalf of copyright holders with AI developers and distribute the resulting compensation to creators. Such mechanisms could provide creators with some form of compensation without impeding AI development.[27]
However, such mechanisms must be carefully designed to avoid becoming another form of the "tragedy of the anticommons." Licensing fees should be reasonable, and the negotiation process should be streamlined; otherwise, transaction costs will still impede AI development.
Recommendation 4: Shorten Copyright Protection Terms
A more fundamental reform is to reexamine copyright protection terms. The lifetime-plus-70-years term has no economic justification and is purely the result of copyright industry lobbying (particularly Disney's repeated lobbying to protect Mickey Mouse). Shortening the protection term to 15 to 25 years would dramatically increase the material entering the public domain, providing richer public resources for both AI training and human creativity.[28]
Conclusion: Seeking Balance Between Innovation and Protection
The intellectual property system is fundamentally a social contract: society grants creators a temporary monopoly in exchange for continued innovation and eventual contribution to the public domain. The terms of this contract should be adjusted as technology and society evolve. In the AI era, clinging to a copyright framework designed for the printing press is like insisting on horse-drawn carriage traffic rules in the age of the automobile -- not only anachronistic but harmful.
The cost of overprotecting copyright is stagnation in AI development, loss of national competitiveness, and a slowdown in innovation. These costs are ultimately borne by all of society -- including the very creators who are supposedly being "protected." A nation that cannot effectively develop AI will fall behind in 21st-century economic competition; and a lagging economy cannot provide creators with a thriving market.
True wisdom lies not in rigidly protecting existing rights but in designing institutions that promote overall social welfare. This requires abandoning the myth of "copyright as inherently sacred" and reexamining from a utilitarian perspective: what kind of intellectual property system can maximize innovation, competitiveness, and social welfare in the AI era? The answer to this question will determine a nation's development trajectory for decades to come.
References
- U.S. Constitution, Art. I, Sec. 8, Cl. 8. The Copyright Clause of the U.S. Constitution. [Congress.gov]
- Fisher, W. (2001). Theories of intellectual property. In S. Munzer (Ed.), New Essays in the Legal and Political Theory of Property. Cambridge University Press. [Harvard]
- Coase, R. H. (1960). The problem of social cost. Journal of Law and Economics, 3, 1-44. [DOI]
- Landes, W. M., & Posner, R. A. (2003). The Economic Structure of Intellectual Property Law. Harvard University Press.
- Lessig, L. (2004). Free Culture: How Big Media Uses Technology and the Law to Lock Down Culture and Control Creativity. Penguin Press. [Free Culture]
- Pollock, R. (2009). Forever minus a day? Calculating optimal copyright term. Review of Economic Research on Copyright Issues, 6(1), 35-60. [SSRN]
- Heller, M. A. (1998). The tragedy of the anticommons: Property in the transition from Marx to markets. Harvard Law Review, 111(3), 621-688. [DOI]
- Lemley, M. A. (2021). How generative AI turns copyright upside down. Stanford Law Review Online. [Stanford Law Review]
- Shapiro, C. (2001). Navigating the patent thicket: Cross licenses, patent pools, and standard setting. Innovation Policy and the Economy, 1, 119-150. [DOI]
- Heller, M. A., & Eisenberg, R. S. (1998). Can patents deter innovation? The anticommons in biomedical research. Science, 280(5364), 698-701. [DOI]
- Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. [arXiv]. The original paper on the Transformer architecture.
- Sobel, B. L. (2017). Artificial intelligence's fair use crisis. Columbia Journal of Law & the Arts, 41(1), 45-97. [DOI]
- 17 U.S.C. § 107. The fair use provision of U.S. copyright law. [Cornell Law]
- Samuelson, P. (2023). Generative AI meets copyright. Science, 381(6654), 158-161. [DOI]
- Sag, M. (2019). The new legal landscape for text mining and machine learning. Journal of the Copyright Society of the U.S.A., 66(3), 291-368. [SSRN]
- Grimmelmann, J. (2016). There's no such thing as a computer-authored work—and it's a good thing, too. Columbia Journal of Law & the Arts, 39(3), 403-416. [DOI]
- Japan Copyright Act, Article 30-4 (2018 Amendment). [e-Gov Laws Search]
- Directive (EU) 2019/790, Articles 3-4. EU Directive on Copyright in the Digital Single Market. [EUR-Lex]
- Geiger, C., & Frosio, G. (2018). Text and data mining in the proposed Copyright Reform: Making the EU fit for an age of big data? IIC - International Review of Intellectual Property and Competition Law, 49(7), 814-844. [DOI]
- The New York Times Co. v. Microsoft Corp. et al., Case No. 1:23-cv-11195 (S.D.N.Y. 2023). [Complaint]
- Brynjolfsson, E., & McAfee, A. (2014). The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. W. W. Norton & Company.
- Acemoglu, D., & Restrepo, P. (2019). Automation and new tasks: How technology displaces and reinstates labor. Journal of Economic Perspectives, 33(2), 3-30. [DOI]
- Litman, J. (2001). Digital Copyright. Prometheus Books. A critical analysis of copyright in the digital age.
- Lee, P. (2024). Derivative AI works and the training data question. Yale Law Journal, 133(4). Exploring the copyright attribution issues of AI-generated works.
- Krueger, A. B. (2019). Rockonomics: A Backstage Tour of What the Music Industry Can Teach Us about Economics and Life. Currency. An economic analysis of income inequality in the music industry.
- Ueno, T. (2021). The flexible copyright exception for 'non-enjoyment' purposes: A user-oriented remixing of existing contents. Journal of Intellectual Property Law & Practice, 16(2), 111-122. [DOI]
- Ginsburg, J. C. (2018). Authors and users in copyright. Journal of the Copyright Society of the U.S.A., 45(1), 1-20.
- Boldrin, M., & Levine, D. K. (2008). Against Intellectual Monopoly. Cambridge University Press. [Online]. A fundamental critique of the intellectual property system.