The John Doe v. GitHub Case, Explained

Samyak Deshpande
Aug 28, 2024
13 min read

This case analysis is co-authored by Sanvi Zadoo and Alisha Garg, along with Samyak Deshpande. The authors of this case analysis have formerly interned at the Indian Society of Artificial Intelligence and Law quite recently.

In a world where artificial intelligence is redefining the way developers write code, Copilot, an AI-powered coding program developed by GitHub in collaboration with OpenAI was launched in 2021. Copilot promised to revolutionize software development by generating code functions based on the input of choice. However, this ‘revolution’ soon found itself in the midst of a legal storm.

The now famous GitHub-Copilot case revolves around allegations that the AI-powered coding assistant uses copyrighted code from open-source repositories without proper credit. The initiative for the lawsuit was taken by programmer and attorney Matthew Butterick and joined by other developers. They claimed that Copilot's suggestions include exact code from public repositories without adhering to the licenses under which the code was published. Despite efforts by Microsoft, GitHub, and OpenAI to dismiss the lawsuit, the court allowed the case to proceed.

Timeline of the case

June 2021	GitHub Copilot is publicly launched in a technical preview
November 2022	Plaintiffs file a lawsuit against GitHub and OpenAI, alleging DMCA violations and breach of contract
December 2022	The court dismisses several of the Plaintiffs' claims, including unjust enrichment, negligence, and unfair competition, with prejudice.
March 2023	GitHub introduces new features for Copilot, including improved security measures and an AI-based vulnerability prevention system.
June 2023	The court dismisses the DMCA claim with prejudice
July 2024	The California court affirms the dismissal of nearly all the claims

Overall, the lawsuit includes breach of contract claims based on the terms of open-source licenses, arguing that Copilot's use of their code violates these licenses. Let us further explore this case in detail and what it means for the future of such Artificial Intelligence programs.

The Technical Features Of The Copilot And Recent Iterations As Well As Competing Entities

Technical Features of GitHub Copilot

GitHub Copilot is an AI-powered code completion tool, developed in collaboration with OpenAI, that seamlessly integrates into development environments to assist developers with code suggestions, completions, and snippets.

The technical features of GitHub Copilot include:

OpenAI Codex Model: Copilot is fueled by OpenAI's Codex, which is a descendant of the GPT-3 model and has been specifically trained on a vast amount of publicly available code from GitHub and other sources. This advanced model comprehends the context of the code being written, including the function or class a developer is working on, comments, and preceding code, allowing it to provide pertinent and contextually appropriate code suggestions.
Diverse Language Support: Copilot is compatible with a wide array of programming languages, encompassing, but not restricted to, Python, JavaScript, TypeScript, Ruby, Go, Java, C#, PHP, and more. This broad compatibility caters to developers utilizing different languages and frameworks. For select languages, Copilot provides language-specific features, such as type hints in TypeScript or docstring generation in Python.
Integration with Visual Studio Code: Copilot seamlessly integrates with Visual Studio Code, a widely used code editor. This integration enables developers to receive real-time code suggestions and completions as they code. Achieved through an extension that can be effortlessly installed and configured within VS Code, this integration ensures accessibility to a diverse developer community.
Real-time Feedback: Copilot offers real-time code suggestions, diminishing the need for developers to scour code snippets or documentation. This instantaneous feedback accelerates the development process and enhances productivity. Suggestions are presented inline within the code editor, allowing developers to seamlessly review and accept or modify suggestions without disrupting their workflow.
Adaptive Learning: Copilot assimilates feedback from developers to enhance its future suggestions. Whether a developer accepts, rejects, or modifies a suggestion, this information contributes to refining the model's future suggestions. Over time, Copilot adjusts to a developer's coding style and preferences, delivering personalized and precise suggestions.
Function and Class Completion: Copilot can generate entire functions and classes based on concise developer descriptions or comments, significantly expediting the development process, particularly for boilerplate code. It also provides examples demonstrating the use of specific functions or libraries and generates documentation strings for functions and classes.
Duplication Detection: Copilot includes a feature to detect and refrain from suggesting code that exactly matches public code, addressing concerns regarding code plagiarism and copyright infringement. Diligent efforts are made to ensure that the generated code complies with open-source licenses and does not violate copyright laws, employing filtering and other mechanisms to prevent code misuse.
Understanding Code Semantics: Copilot surpasses basic keyword matching by comprehending the semantics of the code. This capability allows it to suggest appropriate variable names, function calls, and entire code blocks relevant to the current context. Copilot effectively handles complex coding scenarios, offering pertinent suggestions in contexts such as nested functions, asynchronous code, and multi-threaded applications.
Error Detection: Copilot aids in detecting potential errors or issues within the code and provides suggested fixes. It can recommend code refactoring to enhance readability, performance, or maintainability.
Recent Iterations and Improvements: GitHub Copilot has undergone multiple iterations to enhance its functionality and user experience. These enhancements include:
1. Improved Accuracy and Speed: Upgrades to the underlying Codex model have increased its precision and speed, resulting in more efficient and relevant suggestions.
2. Context Understanding Improvements: Copilot can now provide more accurate suggestions, even in complex coding scenarios, due to improved context understanding.
3. Addition of New Languages: Support for additional programming languages and frameworks has been integrated, expanding its applicability.
4. Seamless Integration: Improvements in the integration with VS Code and other IDEs offer a more seamless and intuitive user experience.
5. Stronger Compliance Measures: Enhanced mechanisms for detecting and preventing the suggestion of copyrighted or sensitive code have been implemented. Additionally, better management of open-source licenses ensures compliance and reduces legal risks.

Competing Entities

Other companies and tools in the industry provide comparable AI-powered code assistance features, posing competition to GitHub Copilot:

Tabnine

Tabnine employs AI to deliver code completions and suggestions across various IDEs and programming languages.
It offers both cloud-based and local models, allowing developers to maintain the privacy of their code.
Tabnine can be trained on a team's codebase to offer bespoke and pertinent suggestions tailored to the specific project.

Kite

Kite provides AI-powered code completions for Python and JavaScript, seamlessly integrating with popular editors such as VS Code, PyCharm, and Sublime Text.

o It furnishes in-line documentation and code examples to facilitate developers' comprehension of using specific functions or libraries.

o Kite utilizes machine learning models to deliver precise and context-aware code completions.

IntelliCode by Microsoft

IntelliCode delivers AI-assisted code recommendations based on patterns found in open-source projects and a developer's codebase.
Integrated into Visual Studio and Visual Studio Code, it supports a wide array of programming languages.
It tailors recommendations to specific teams by learning from the code patterns within a team's codebase.

Codota

Codota focuses on providing code completions and suggestions for Java and Kotlin, with recent expansion into other programming languages.
It provides both cloud-based and on-premises solutions to accommodate varying privacy needs.
Codota learns from the developer's codebase to deliver more accurate and relevant suggestions.

Examining the facts, arguments and verdict of the case

Facts

In the given case of J. Doe vs. GitHub, the plaintiffs, who are developers, brought a legal action against GitHub and its parent company, Microsoft. The primary issue of the case arises around GitHub's Copilot tool, an AI-based code completion tool designed to assist developers by generating code snippets. The plaintiffs alleged that Copilot was trained on publicly available code, including code they had authored, which was protected under open-source licenses. They claimed that Copilot's generated code snippets were identical or substantially similar to their original work, which, according to them, amounted to copyright infringement. Furthermore, they argued that Copilot violated the terms of the open-source licenses by failing to give any statement of attribution to the original authors of the code and by not adhering to the license conditions.

As stated above, there are primarily two concerns raised by the plaintiff, first that there is copyright infringement by the defendants and second, that they have violated the contract. This takes us to the issues posed by the case.

Issues

1. Whether Copilot's generation of code snippets constituted copyright infringement?

2. Whether the plaintiffs had a valid claim for breach of contract due to the alleged violation of open-source licenses?

3. Whether they were entitled to restitution for unjust enrichment?

4. Whether their request for punitive damages was justified?

Arguments

Plaintiffs

The plaintiffs argued that Copilot's operation resulted in the unauthorized use and reproduction of their code, which infringed on their copyrights. They also contended that by not attributing the generated code to the original authors and by not complying with the open-source licenses, GitHub and Microsoft had breached contractual obligations. The plaintiffs sought restitution for the unjust enrichment of the defendants, who they claimed had profited from Copilot. Additionally, they sought punitive damages, arguing that the defendants' actions warranted such penalties.

Defendants

In response, GitHub and Microsoft countered that Copilot did not produce identical copies of the plaintiffs' code, and any similarities were incidental due to the nature of the tool. They argued that open-source licenses do not create enforceable contract claims in this context. Furthermore, the defendants asserted that the plaintiffs had not sufficiently stated a claim for restitution for unjust enrichment under California law. Regarding the punitive damages, the defendants argued that such damages are not typically recoverable in breach of contract cases.

Decision of the Court

Upon review, the court decided to dismiss the plaintiffs' claim under Section 1202(b) with prejudice, meaning that the claim could not be refiled. This section of the claim pertained to allegations related to the removal or alteration of copyright management information. The court found that the plaintiffs had not provided sufficient grounds to support this claim. However, the court allowed the plaintiffs' breach of contract claim to proceed, specifically regarding the alleged violations of open-source licenses. This meant that the plaintiffs could continue to pursue their argument that the defendants had breached the terms of these licenses.

AI and Skill Security: The Impact of the Judgment on the Developer Community

Source of training data

One of the significant challenges in suing companies that offer generative AI tools lies in identifying the sources of the training data. In the present case, all training data used by Copilot was hosted on GitHub and subject to GitHub's open-source licenses. This made it relatively straightforward for the plaintiffs to pinpoint the specific license terms they believed had been violated. However, for more complex AI models like ChatGPT, which do not disclose their training data sources, proving similar violations could be considerably more difficult. The opaque nature of these models' data origins presents a substantial hurdle for plaintiffs attempting to assert their rights.

In the context of evolving AI legislation, potential European laws may soon require companies to disclose the data used to train their AI models. Such regulations could significantly aid in identifying and proving violations related to AI-generated content.

Prompt Engineering Considerations

Additionally, the court's decision on the plaintiffs' standing to seek monetary damages is particularly noteworthy. The court ruled that plaintiffs could seek damages even if they themselves entered the prompts that led to the generation of the allegedly infringing content. This ruling could have far-reaching implications, especially given similar approaches in other ongoing cases involving generative AI.

For instance, in Authors Guild v. OpenAI Inc., plaintiffs claimed that by entering prompts, they generated outlines of sequels or derivatives of their original works, which they argued constituted copyright infringement. Similarly, in The New York Times Company v. Microsoft Corporation, the plaintiff entered prompts to recreate previously published articles, claiming this also amounted to infringement. In both cases, the plaintiffs themselves provided the input prompts that led to the generation of the contested content.

The court’s decision in J. Doe vs. GitHub aligns with the plaintiffs' approach in these other cases by affirming that the act of entering prompts does not disqualify them from seeking damages. This ruling emphasizes that plaintiffs can argue their content was used inappropriately by the AI, regardless of their role in generating the specific outputs.

Moreover, the court in J. Doe vs. GitHub held that plaintiffs do not need to demonstrate, for standing purposes, that other users of the AI model would seek to reproduce their content. This aspect of the ruling is significant because it lowers the bar for establishing standing in copyright infringement cases involving generative AI. Plaintiffs no longer need to prove that their content is commonly used or sought after by other users of the AI tool. This could be a crucial argument in cases where the original content is not widely recognized or utilized but was still allegedly infringed upon by the AI.

The Impact of Development, Maintenance, and Knowledge Management on the Coding Market and Competition

Having understood the above points, it is also pertinent to note that the if outcome of this case could reshape the competitive landscape for AI coding assistants. Despite the dismissal, the case has far-reaching implications on how the AI community navigates intellectual property and copyright issues. If GitHub Copilot and similar tools were to be modifed to provide more attribution as demanded, it might increase the complexity and cost of developing these tools. This could slow down the adoption of AI in coding, as companies might need to invest more in ensuring this new compliance with open-source licenses.

However following the dismissal, the current model for AI tools like Copilot has gotten reinforced, allowing them to continue suggesting code without significant changes to their attribution practices. This legal backing can boost the confidence of AI developers and users, leading to continued innovation and integration of AI in coding. Companies might feel more secure investing in AI tools, knowing that the legal risks associated with copyright infringement are currently manageable under existing laws and this precedent.

Similarly, for maintenance, AI tools can continue to play a significant role in optimizing code and suggesting improvements without the immediate need for new attribution systems. This ensures that maintenance processes remain efficient and cost-effective.

Plus, the ruling suggests that AI-generated code does not necessarily infringe on copyright if it does not explicitly replicate large chunks of protected material. This can now encourage much broader use of AI tools in knowledge management, enabling organisations to capture and disseminate coding practices and solutions more freely.

As the market adjusts to these legal and ethical considerations, we predict that the companies that effectively navigate these challenges may gain a competitive edge.

US Judgment's Implications for the Indian IT Community

Importance of Legal Compliance

The ruling emphasizes the imperative requirement for AI tools and software development practices to strictly comply with copyright laws. It is crucial for Indian IT companies and developers to ensure that their utilization of AI tools, such as GitHub Copilot, complies with these laws to evade potential legal consequences.

Adhering to open-source licenses is of paramount importance. Indian developers must be vigilant in guaranteeing that AI-generated code does not breach these licenses, wherein there may be stipulations such as attribution requirements and constraints on commercial usage.

Being well-versed in local and international copyright laws and regulations is quintessential for Indian IT companies to maneuver the complexities of legal compliance in a globalized industry. Implementing proactive legal strategies to supervise the use of AI tools within development processes can forestall potential violations and mitigate risks.

Ethical AI Development

Indian IT companies should prioritize the creation of AI systems that operate transparently and are accountable for their outputs. Granting users more control over AI-generated content can foster trust and diminish legal risks.

Indian IT firms should formulate and embrace ethical guidelines for AI usage addressing matters such as data privacy, bias in AI models, and the responsible utilization of AI-generated content. Engaging with diverse stakeholders can help ensure the responsible development and usage of AI tools.

Focus on Skill Development

Given the rapid evolution of AI technologies, Indian developers should stay abreast of the latest advancements and invest in training programs encompassing topics such as copyright laws, open-source licenses, and ethical AI practices to comprehend the legal and ethical implications of utilizing AI tools.

As AI takes on routine tasks, developers can concentrate on the more creative and innovative facets of their work. Encouraging developers to acquire new languages, frameworks, and tools can broaden their expertise and adaptability.

Protection of Intellectual Property

It is imperative for Indian developers and companies to be vigilant about their intellectual property rights and seek appropriate redress when their rights are infringed upon by AI tools or other entities.

The ruling underscores the necessity for developers whose rights are infringed upon by AI tools to seek compensation. This reinforces the economic worth of intellectual property and the need to safeguard it.

Global Standards and Competitiveness

Aligning with global legal and ethical standards can enhance the competitiveness and reputation of Indian IT companies in the international market. Upholding legal and ethical compliance can facilitate smoother international collaborations and create new opportunities for partnerships and projects.

This case holds substantial implications for the economic rights of the Indian IT community, emphasizing the paramount importance of safeguarding intellectual property (IP), fostering economic opportunities through responsible AI utilization, and ensuring adherence to global standards to protect economic interests.

Safeguarding Intellectual Property

The judgment underscores the necessity for Indian developers and companies to vigilantly protect their intellectual property rights. It is imperative for them to comprehensively understand the legal frameworks safeguarding their code and creations, ensuring that these rights remain inviolate in the presence of AI tools such as GitHub Copilot.

Developers and companies are urged to proactively safeguard their IP through the utilization of licensing agreements, patents, and trademarks to formally protect their software and code. Furthermore, regular auditing of their code in AI-generated outputs is recommended to detect and address potential infringements effectively.

The judgment implies that developers whose rights are violated by AI tools are entitled to seek legal recourse and compensation. Thus, emphasizing the economic value of intellectual property and the indispensability of protecting it through lawful channels.

Indian developers and companies are advised to advocate for more robust legal frameworks that offer comprehensive protection for intellectual property in the context of AI and software development, including advocating for clearer regulations and more effective enforcement mechanisms.

Economic Opportunities

Responsible usage of AI tools such as GitHub Copilot presents Indian IT companies with the opportunity to enhance productivity and foster innovation. AI's capacity to handle routine coding tasks allows developers to focus on more intricate and creative aspects of software development, thereby leading to the creation of higher-quality software products and services.

By demonstrating compliance with legal and ethical standards, Indian IT companies can gain a competitive edge in the global market, attracting more clients and partnerships and consequently enhancing their economic prospects.

Investing in skill development pertaining to AI and legal compliance can make Indian developers more competitive on a global scale. This includes training in state-of-the-art AI technologies, comprehension of copyright laws, and adherence to best practices in ethical AI development.

As AI continues to evolve, new job opportunities will arise in domains such as AI ethics, legal compliance, and advanced software development. Indian developers equipped with these skills can leverage these opportunities to enhance their career prospects and economic potential.

Global Standards and Competitiveness

The judgment incentivizes Indian IT companies to align with global legal and ethical standards. By adopting best practices in AI development and usage, Indian firms can sustain their competitiveness and reputation in the international market.

Adherence to these standards can differentiate companies within a crowded marketplace, attracting international clients and partnerships and thereby enhancing economic growth.

Understanding and complying with judgments such as the US GitHub Copilot case can smoothen international collaborations, enabling Indian IT firms to engage in cross-border projects and partnerships, thereby expanding their global footprint.

Ensuring that Indian IT services are perceived as reliable and legally compliant can cultivate trust with international partners, leading to more collaborative projects, joint ventures, and increased economic opportunities.

Economic Redress and Fair Compensation

The judgment reinforces that developers whose intellectual property is used without authorization by AI tools are entitled to seek economic redress. This underscores the significance of fair compensation for intellectual property usage and the economic rights of creators.

Indian developers should seek legal support to comprehend their rights and the available mechanisms for seeking compensation in cases of IP infringement. This involves consulting with legal experts and pursuing litigation when necessary.

The judgment highlights the economic value of intellectual property and the contributions of individual developers. Recognizing and fairly compensating these contributions is fundamental for fostering innovation and ensuring the sustainability of the IT industry.

Therefore, Indian IT companies should equip their developers with resources and legal assistance to safeguard their intellectual property rights, allowing creators to focus on innovation without apprehensions about potential infringements.

Visual legal analytica

Subscribe to our newsletter

Write a
Title Here

The John Doe v. GitHub Case, Explained

Timeline of the case

The Technical Features Of The Copilot And Recent Iterations As Well As Competing Entities

Technical Features of GitHub Copilot

Competing Entities

Tabnine

Kite

IntelliCode by Microsoft

Codota

Examining the facts, arguments and verdict of the case

Facts

Issues

Arguments

Plaintiffs

Defendants

Decision of the Court

AI and Skill Security: The Impact of the Judgment on the Developer Community

Source of training data

Prompt Engineering Considerations

The Impact of Development, Maintenance, and Knowledge Management on the Coding Market and Competition

US Judgment's Implications for the Indian IT Community

Importance of Legal Compliance

Ethical AI Development

Focus on Skill Development

Protection of Intellectual Property

Global Standards and Competitiveness

Safeguarding Intellectual Property

Economic Opportunities

Global Standards and Competitiveness

Economic Redress and Fair Compensation

Recent Posts

Visual legal analytica

Subscribe to our newsletter

Write a Title Here

Timeline of the case

The Technical Features Of The Copilot And Recent Iterations As Well As Competing Entities

Technical Features of GitHub Copilot

Competing Entities

Tabnine

Kite

IntelliCode by Microsoft

Codota

Examining the facts, arguments and verdict of the case

Facts

Issues

Arguments

Plaintiffs

Defendants

Decision of the Court

AI and Skill Security: The Impact of the Judgment on the Developer Community

Source of training data

Prompt Engineering Considerations

The Impact of Development, Maintenance, and Knowledge Management on the Coding Market and Competition

US Judgment's Implications for the Indian IT Community

Importance of Legal Compliance

Ethical AI Development

Focus on Skill Development

Protection of Intellectual Property

Global Standards and Competitiveness

Safeguarding Intellectual Property

Economic Opportunities

Global Standards and Competitiveness

Economic Redress and Fair Compensation

Write a
Title Here